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■^^ ' Consider an Erdos-Renyi random graph in which each edge is present independently 

with probability 1/2, except for a subset C^v of the vertices that form a clique (a 
^^ ' completely connected subgraph). We consider the problem of identifying the clique, 

given a realization of such a random graph. 

The best known algorithm provably finds the clique in linear time with high prob- 
ability, provided ICatJ > 1.261ViV |YDP11| . Spectral methods can be shown to fail on 
cliques smaller than \//V- In this paper we describe a nearly linear time algorithm that 
T^ , succeeds with high probability for |Cjv| > (1 + £)\/iV/e for any e > 0. This is the first 

j^ ' algorithm that provably improves over spectral methods. 

We further generalize the hidden clique problem to other background graphs (the 
standard case corresponding to the complete graph on N vertices). For large girth 
regular graphs of degree (A + 1) we prove that 'local' algorithms succeed if |CAr| > 
Kj^ - (1 + £)7V/VeA and fail if |CAr| < (1 - e)iV/VeA. 

O ■ 1 Introduction 

!>■ 

T^lj- \ Numerous modern data sets have network structure, i.e. the dataset consists of observations 

^D ' on pairwise relationships among a set of A^ objects. A recurring computational problem 

in this context is the one of identifying a small subset of 'atypical' observations against 
a noisy background. This paper develops a new type of algorithm and analysis for this 
problem. In particular we improve over the best methods for finding a hidden clique in an 

^\ . otherwise random graph. 

C^ \ Let Gn = {[N],E]\j) be a graph over the vertex set [N] = {1,2, .. . , A^} and Qoi Qi 

be two distinct probability distributions over the real line M. Finally, let Cat C [N] be 
a subset of vertices uniformly random given its size |CAr|. For each edge (i,j) G E^ we 
draw an independent random variable Wij with distribution Wij ~ Qi if both i G Cat and 
j G Cat and Wij ~ Qq otherwise. The hidden set problem is to identify the set Cat given 
knowledge of the graph Gn and the observations W = {Wij){ij)<^Ef^- We will refer to Gn 
as to the background graph. We emphasize that Gn is non-random and that it carries no 
information about the hidden set Cat- 

In the rest of this introduction we will assume, for simplicity, Qi = d+i and Qq = 
(l/2)(5+i + (1/2)5^1. In other words, edges {i,j) G En with endpoints {i,i} ^ Cat 
are labeled with Wij = +1. Other edges (i,j) G En have a uniformly random label 
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Wij G {+1,-1}. Our general treatment in the next sections covers arbitrary subgaussian 
distributions Qq and Qi and does not require these distributions to be known in advance. 

The special case Gn = Kjy (with K]\f the complete graph) has attracted considerable 
attention over the last twenty years |Jer92| and is known as as the hidden or planted clique 
problem. In this case, the background graph does not play any role, and the random 
variables W = (W^jj)ije[Ari can be organized in an A^ x A^ symmetric matrix (letting, by 
convention, Wu = 0). The matrix W can be interpreted as the adjacency matrix of a 
random graph TZjy generated as follows. Any pair of vertices {i,j} ^ Cat is connected by 
an edge. Any other pair {i,j} % Cn is instead connected independently with probability 
1/2. (We use here {+1, —1} instead of {1,0} for the entries of the adjacency matrix. This 
encoding is unconventional but turns out to be mathematically convenient.) Due to the 
symmetry of the model, the set Cat Q [N] does not need to be random and can be chosen 
arbitrarily in this case. 

It is easy to see that, allowing for exhaustive search, the hidden clique can be found 
with high probability as soon as |CAr| > 2(1 + e) log2 N for any e > 0. This procedure has 
complexity exp[G((log A'")^)]. Viceversa, if |CAr| < 2(1 — e)log2iV, then the clique cannot 
be uniquely identified. 

Despite a large body of research, the best polynomial-time algorithms to date require 
ICtvI > c\/N to succeed with high probability. This was first achieved by Alon, Krivelevich 
and Sudakov |AK S98] through a spectral technique. It is useful to briefly discuss this class 
of methods and their limitations. Letting uc^ G M be the indicator vector on Cat (i.e. 
the vector with entries {ucpf)i = 1 for i G Cat and = otherwise), we have 

W = uc^ul^+Z-Zc^^c^. (1.1) 

Here Z G R^^^ is a symmetric matrix with i.i.d. entries {Zij)i^j uniformly random in 
{+1, —1} and Zcj^,Cn is the matrix obtained by zeroing all the entries Zij with {i,j} ^ Cat. 
Denoting by ||^||2 the £2 operator norm of matrix A, we have \\uC]^Uq \\2 = II^^CArlli — \^n\- 
On the other hand, a classical result by Fiiredi and Komlos |FK81) implies that, with high 
probability, \\Z\\2 < cq^/N and ||.^Civ,Cjvll2 ^ coylOvl- Hence, if |CAr| > c^/N with c large 
enough, the first term in the decomposition (jl.ip dominates the others. By a standard 
matrix perturbation argument |DK70] . letting vi denote the principal eigenvector of W, 
we have \\vi — ucj^/\/\Cn\\\2 < ^ provided the constant c = c(e) is chosen large enough. 
It follows that selecting the |CAf| largest entries of vi yields an estimate Cn C [N] that 
includes at least half of the vertices of Cn'- the other half can be subsequently identified 
through a simple procedure |AKS98j . 

The spectral approach does not exploit the fact that ICa^I is much smaller than A^ or 
-in other words- the fact that uqj^ is a sparse vector. Recent results in random matrix 
theory suggest that it is unlikely that the same approach can be pushed to work for ICa^I < 
(1 — s)\/N (for any e > 0). For instance the following is a consequence of |KY1H Theorem 
2.7]. (The proof is provided in Appendix IB. ip 

Proposition 1.1. Let ec^ = uq^/N'^''^ be the normalized indicator vector on the vertex 
set Cat, and Z a Wigner random matrix with subgaussian entries such that E{Zij} = 0, 
E{ZfA = 1/N Denote by vi,V2,V3, . . . ,V£ the eigenvectors of W = ucj^u^ + Z, corre- 
sponding to the i largest eigenvalues. 

Assume \Cn\ > (1 + e)vN for some e > 0. Then, with high probability, {vi,eC]^) > 
min(-y/e, e)/2. Viceversa, assume \Cn\ < (1 — s)y/N. Then, with high probability for any 
fixed constant 6 > 0, \{vi, ec^y)! < cN~^''^~^ for all i & {1, . . . ,£} and some c = c{e,£). 



In other words, for IC^rl below y/N and any fixed £, the first £ principal eigenvectors of 
W are essentially no more correlated with the set Cat than a random unit vector. A natural 
reaction to this limitation is to try to exploit the sparsity of ucjv • Ames and Vavasis |AV11| 
studied a convex optimization formulation wherein W is approximated by a sparse low- 
rank matrix. These two objectives (sparsity and rank) are convexified through the usual 
£i-norm and nuclear-norm relaxations. These authors prove that this convex relaxation 
approach is successful with high probability, provided |CAr| > c^fN for an unspecified 
constant c. A similar result follows from the robust PCA analysis of Candes, Li, Ma, and 
Wright |CLMW11| . 

Dekel, Gurel-Gurevich and Peres |YDP11] developed simple iterative schemes with 
0{N'^) complexity (see also ^FRIO) for similar approaches). For the best of their algorithms, 
these authors prove that it succeeds with high probability provided |CAr| > 1.261\/N. 
Finally, |AKS98j also provide a simple procedure that, given an algorithm that is successful 
for I Cat I > cvN produces an algorithm that is successful for |CAr| > C'\/N/2, albeit with 
complexity y/N times larger. 

Our first result proves that the hidden clique can be identified in nearly linear time well 
below the spectral threshold yN, see Proposition ll.il 



Theorem 1. Assume \Cn\ > (1 + e)^jN/e, for some e > independent of N . Then 
there exists a 0{N'^ log N) time algorithm that identifies the hidden clique Cat with high 
probability. 

In Section [2] we will state and prove a generalization of this theorem for arbitrary -not 
necessarily known- distributions Qo, Qi. 

Our algorithm is based on a quite different philosophy with respect to previous ap- 
proaches to the same problem. We aim at estimating optimally the set Ctv by computing 
the posterior probability that i G Cat, given edge data W. This is, in general, #P-hard 
and possibly infeasible if Qq, Qi are unknown. We therefore consider an algorithm derived 
from belief propagation, a heuristic machine learning method for approximating posterior 
probabilities in graphical models. We develop a rigorous analysis of this algorithm that 
is asymptotically exact as A^ — >• oo, and prove that indeed the algorithm converges to the 
correct set of vertices Cat for |CAr| > (1 + e)^J N/e. Viceversa, the algorithm converges to 
an uninformative fixed point for |CAf| < (1 — e)^/ N/e. 



Given Theorem [H it is natural to ask whether the threshold \JN je has a fundamental 
computational meaning or is instead only relevant for our specific algorithm. Recently, 
|FGR^12 proved complexity lower bounds for the hidden clique model, in a somewhat 



different framework. In the formulation of FGR"'"12 , one can query columns of W and a 



new realization from the distribution of W given Cat is instantiated at each query. Assuming 
that each column is queried 0(1) times, their lower bound would require |CAf| > N'^''^~^ . 
While this analysis can possibly be adapted to our setting, it is unlikely to yield a lower 
bound of the form | Cat | > cy/N with a sharp constant c. 

Instead, we take a different point of view, and consider the hidden set problem on a 
general background graph Gm- Let us emphasize once more that Gn is non random and that 
all the information about the hidden set is carried by the edge labels W = {Wik)(i^k)^EM- 
In addition, we attach to the edges a collection of independent labels U = {Uik)(i^k)&Ejq 
i.i.d. and uniform in [0, 1]. The U labels exist to provide for (possible) randomization in 
the algorithm. Given such a graph Gm with labels W , U, a vertex i S [A^] and t > 0, we 
let Ba\\G^{i]t) denote the subgraph of Gn induced by those vertices j G [A^] whose graph 



distance from i is at most t. We regard Ba\\Qj^{i; t) as a graph rooted at i, with edge labels 
Wji,Uji inherited from Gjy- 

Definition 1.2. An algorithm for the hidden set problem is said to be t-local if, denoting 
by Civ its output, the membership {i G Cat) is a function of the neighborhood Ba\\Gj^{i',t). 
We say that it is local if it is t -local for some t independent of N . 



The concept of (randomized) local algorithms was introduced in |Ang80 and formalizes 



the notion of an algorithm that can be run in 0(1) time in a distributed network. We refer 
to |Lin92t INS95| for earlier contributions, and to |Suol3] for a recent survey. 

We say that a sequence of graphs {Gn}n>i is locally tree-like if, for any t > 0, the 
fraction of vertices i G [A^] such that Ballcj^ (z; t) is a tree converges to one as N ^ oo. As 
a standard example, random regular graphs are locally tree-like. The next result is proved 
in Section m 

Tlieorem 2. Let {Gn}n>i b^ o, sequence of locally tree-like graphs, with regular degree 
(A + 1), and let Cn Q [N] be a uniformly random subset of the vertices of given size \C]\f\. 
If I Cat I < (1 — e)N/\/eA for some e > 0, there exists ^ > independent of A and e such 
that any local algorithm outputs a set of vertices Cn with E[|C7vACAr|] > N^/\/A for all 
N large enough. 

Viceversa, if \C]\[\ > (1 + e)A^/\/eA for some e > 0, there exists ^(e) > and a local 
algorithm that outputs a set of vertices Cn satisfying E[|C7vACjv|] < N exp{—^(e)^/A) for 
all N large enough. 

Notice that, on a bounded degree graph, the hidden set Cat can not be identified exactly 
with high probability. Indeed we would not be able to assign a single vertex i with high 
probability of success, even if we knew exactly the status of all of its neighbors. On the 
other hand, purely random guessing yields E[|CArACAr|] = NQ{1/^/A). The last theorem, 
thus, establishes a threshold behavior: local algorithms can reconstruct the hidden set with 
small error if and only if |CAr| is larger than N/\/eA. 

Unfortunately Theorem [2] only covers the case of sparse or locally tree-like graphs. We 
let A^ — )• oo at A fixed and then take A arbitrarily large. However, if we naively apply it 
to the case of complete background graphs Gn = Kn, by setting A = A^ — 2, we get a 
threshold at | Cn \ ~ yiV/e which coincides with the one in Theorem [TJ This suggests that 
\JN je might be a fundamental limit for solving the hidden clique problem in nearly linear 
time. It would be of much interest to clarify whether this is indeed the case. 

The contributions of this paper can be summarized as follows: 

1. We develop a new algorithm based on the belief propagation heuristic in machine 
learning, that applies to the general hidden set problem. 

2. We establish a sharp analysis of the algorithm evolution, rigorously establishing that 
it can be used to find hidden cliques of size ^jN/e in random graphs. The analysis 
applies to more general noise models as well. 

3. We generalize the hidden set problem to arbitrary graphs. For locally tree-like graphs 
of degree (A -|- 1), we prove that local algorithms succeed in finding the hidden set 
(up to small errors) if and only if its size is larger than N/\Je/\. 

The complete graph case is treated in Section [21 with technical proofs deferred to Section 
[3l The locally tree-like case is instead discussed in Section [J] with proofs in Section [5j 



1.1 Further related work 

A rich line of research in statistics addresses the problem of identifying the non-zero entries 
in a sparse vector (or matrix) x from observations W = x + Z where Z has typically i.i.d. 
standard Gaussian entries. In particular [ACDHOSi lABBDLlOi lACCDlll IBDN12J study 
cases in which the sparsity pattern of x is 'structured'. For instance, we can take x G M 
a matrix with Xij = /i if {i,j} ^ Cat and Xij = otherwise. This fits the framework studied 
in this paper, for Gn the complete graph and Qq = N(0, 1), Qi = N(/i, 1). This literature 
however disregards computational considerations. Greedy search methods were developed 
in several papers, see e.g. |SN08t[SWPN09] . 

Also, the decomposition (jl.ip indicates a connection with sparse principal component 
analysis [ZHTOGl [JLMI IdEG.TLnTl IdBGnSj . This is the problem of finding a sparse low-rank 
approximation of a given data matrix W. Remarkably, even for sparse PGA, there is a large 
gap between what is statistically feasible and what is achievable by practical algorithms. 
Berthet and Rigollet |BR13| recently investigated the implications of the assumption that 
hidden clique is hard to solve for [Cat = o{^/N) on sparse PGA. 

The algorithm we introduce for the case Gn = Kn is analogous to the 'linearized BP' 
algorithm of [MT061 [GW06J . and to the approximate message passing (AMP) algorithm 
of |DMM09| IBMlll IBLM12] . These ideas have been applied to low-rank approximation 
in |RF12| . The present setting poses however several technical challenges with respect to 
earlier work in this area: (i) The entries of the data matrix are not i.i.d.; (ii) They are 
non-Gaussian with -in general- non-zero mean; (in) We seek exact recovery instead of 
estimation; (iv) The sparsity set Cat to be reconstructed scales sublinearly with N. 

Finally, let us mention that a substantial literature studies the behavior of message 
passing algorithms on sparse random graphs |RU08l IMM09| . In this paper, a large part 
of our technical effort is instead devoted to a similar analysis on the complete graph, in 
which simple local convergence arguments fail. 

1.2 Notations 

Throughout the paper, [M] = {1,2,..., M} denotes the set of first M integers. We employ 
a slight abuse of notation to write [N]\i,j for [A^]\{f,j}. The indicator function is denoted 

byl(-). 

We write X ~ P when a random variable X has a distribution P. We will sometimes 
write Ep to denote expectation with respect to the probability distribution P. Probability 
and expectation will otherwise be denoted by P and E. For a S M, 6 € M+, N(a, b) denotes 
the Gaussian distribution with mean a and variance b. The cumulative distribution function 
of a standard Gaussian will be denoted by $(x) = f^^ e^^ '^dz/\/2TT. 

Unless otherwise specified, we assume all edges in the graphs mentioned are undirected. 
We denote by di the neighborhood of vertex i in a graph. 

We will often use the phrase "for i S Cat" when stating certain results. More precisely, 
this means that for each A^ we are choosing an index ij^ € Cn, which does not depend on 
the edge labels W. 

We use c, cq, ci, ... and Ci, C2, . . . to denote constants independent of A^ and ICatI- 

Throughout, for any random variable Z we will indicate by Pz its law. 



2 The complete graph case: Algorithm and analysis 

In this section we consider the case in which the background graph is complete, i.e. Gn = 
Kn. Since Gm does not play any role in this case, we shall omit all reference to it. We will 
discuss the reconstruction algorithm and its analysis, and finally state a generalization of 
Theorem [T] to the case of general distributions Qq, Qi. 

2.1 Definitions 

In the present case the data consists of a symmetric matrix W G M^^^, with (VFij)i<j 
generated independently as follows. For an unknown set Cat C [N] we have Wij ~ Qi if 
{hj} ^ Cat, and Wij ~ Qo otherwise. Here Qi and Qq are distinct probability measures. 
We make the following assumptions: 

I. Qq has zero mean and Qi has non-zero mean A. Without loss of generality we shall 
further assume that Qq has unit variance, and that A > 0. 

II. Qq and Qi are subgaussian with common scale factor p. 

It will be clear from the algorithm description that there is indeed no loss in generality 
in assuming that Qq has unit variance and that A is positive. Recall that a probability 
distribution P is subgaussian with scale factor p > if for all y S M we have: 



There is no loss of generality in assuming a common scale factor for Qq and Qi- 

The task is to identify the set Cat from a realization of the matrix W . As discussed 
in the introduction, the relevant scaling is |CAr| = Q{\/N) and we shall therefore define 
ktv = |CAr|/viV- Further, throughout this section, we will make use of the normalized 
matrix 

A = -^W. (2.1) 



In several technical steps of our analysis we shall consider a sequence of instances 
{{WnxN-,^n)}n>i indexed by the dimension A^, such that limAr_>oo kat = k G (0,oo). 
This technical assumption will be removed in the proof of our main theorem. 

2.2 Message passing and state evolution 

The key innovation of our approach is the construction and analysis of a message passing 
algorithm that allows us to identify the hidden set Cat . As we demonstrate in Section HI 
this algorithm can be derived from belief propagation in machine learning. However this 
derivation is not necessary and the treatment here will be self-contained. 

The message passing algorithm is iterative and at each step t G {1, 2, 3, . . . } produces 
an N X N matrix 9* whose entry {i,j) will be denoted as Oj^, to emphasize the fact that 
6* is not symmetric. By convention, we set 0*_5.j = 0. The variables Oj^,: will be referred 
to as messages, and their update rule is formally defined below. 



Definition 2.1. Let 9^ G W^^^ be an initial condition for the messages and, for each t, 
let f{- ^t) : M — )• M he a scalar function. The message passing orbit corresponding to the 
triple {A,f,9^) is the sequence of {6^}t>o, 9^ S M defined by letting, for each t > 0/ 



at+l ^ 

e&[N]\i,j 



Y, Auf{9U„t), Vi/iG[iV]. (2.2) 



We also define a sequence of vectors {9^}t>i with 9* = (^*)ie[iv] G K^, by letting (the 
entries of 9* being indexed by i £ [N]) given by: 

9l+'= Y. Auf{9U,t). (2.3) 

l&[N]\i 

The functions f{-,t) will be chosen so that they can be evaluated in 0(1) operations. 
Each iteration can be implemented with 0{N'^) operations. Indeed (6**"*" )ig[iv] can be 
computed in 0{N'^) as per Eq. (j2.3p . Subsequently we can compute (6**^ JjjgjTv] in 0{N'^) 
operations by noting that (9-j;^j- = (9-+^ - Aijf{9^-^^,t). 

The proper choice of the functions f{-,t) plays a crucial role in the achieving the 
claimed tradeoff between |Civ| and A^. This choice will be optimized on the basis of the 
general analysis developed below. 

Before proceeding, it is useful to discuss briefly the intuition behind the update rule 
introduced in Definition 12. li For each vertex i, the message ^*_^,- and the value 0* are 
estimates of the likelihood that i E Cat: they are larger for vertices that are more likely 
to belong to the set Cat. In order to develop some intuition on Definition 12.11 consider 
a conceptually simpler iteration operating as follows on variables i?* = (i?*)jg[jv]- For 
each i G [A*"] we let t?*"*" = Yli(^\N]^ijf('^h^)- ^^ ^^^ special case f{'d]t) = {} we obtain 
the iteration i?*+^ = A^^ which is simply the power method for computing the principal 
eigenvector of ^. As discussed in the introduction, this does not use in any way the 
information that |CAr| is much smaller than A'^. We can exploit this information by taking 
f{'&;t) a rapidly increasing function of t? that effectively selects the vertices i G [A^] with 
t?* large. We will see that this is indeed what happens within our analysis. 

An important feature of the message passing version (operating on messages 9l_^j) is 
that it admits a characterization that is asymptotically exact as A^ — t- oo. In the large A^ 
limit, the messages 9j (for fixed t) converge in distribution to Gaussian random variables 
with certain mean and variance. In order to state this result formally, we introduce the 
sequence of mean and variance parameters {(/^t,Tj^)}t>o by letting /Uq = 1, Tq = and 
defining , for t > 0, 

fit+i = XKE[fif,t + rtZ,t)] (2.4) 

T?^,=nf{TtZ,t)\ (2.5) 

Here expectation is with respect to Z '^ N(0, 1). We will refer to this recursion as to state 
evolution. 

Lemma 2.2. Let f{u,t) be, for each t G N a finite-degree polynomial. For each N, let 
W G M be a symmetric matrix distributed as per the model introduced above with 

K]\[ = \Cn\/\^ — )• k G (0, oo). Set 9^^j = 1 and denote the associated message passing 
orbit by {9^}t>o- 



Then, for any bounded Lipschitz function ^ : R i— )■ R, the following limits hold in 
probability: 

}™ W-1 E ^(^^) = ^[^(^* + ^* ^)] (2-6) 

7 — Vr^-i ( AT- ^^ * 



N^oo I Cat 



iec 



AT 



lim 1 j; V'(e*)=IE[^(TiZ)]. (2.7) 

ie[N]\CM 

Here expectation is with respect to Z r^ N(0, 1) where fitjT^ are given by the recursion in 
Eqs. (|2:iD . (l23]) . 

The proof of this Lemma is deferred to Section 13.11 Naively, one would like to use the 
central limit theorem to approximate the distribution on the right-hand side of Eq. (j2.2p or 
of Eq. (j2.3p by a Gaussian. This is, however, incorrect because the messages O^^i depend 
on the matrix A and hence the summands are not independent. In fact, the lemma would 
be false if we did not use the edge messages and replaced Oi^i by 9^ in Eq. (|2.2p or in 
Eq. ^M- 

However, for the iteration Eq. (12. 2|) . we prove that the distribution of 0* is approximately 
the same that we would obtain by using a fresh independent copy of A (given Cat) at each 
iteration. The central limit theorem then can be applied to this modified iteration. In 
order to prove that this approximation, we use the moment method, representing Oj^- and 
Oj as polynomials in the entries of A. We then show that the only terms that survive in 
these polynomials as A^ — )• cxd are the monomials which are of degree 0, 1 or 2 in each entry 
of A 

2.3 Analysis of state evolution 

Lemma [2.21 implies that the distribution of Oj is very different depending whether i G Cjy or 
not. If i G Cat then Oj is approximately N{0,t^). If instead i £ Cat then 6j is approximately 

N(/it,r2). 

Assume that, for some choice of the functions /(•, •) and some t, ^t is positive and much 
larger than r^. We can then hope to estimate Cat by selecting the indices i such that 6\ is 
above a certain thresholqj. This motivates the following result. 

Lemma 2.3. Assume that Xk > e^^'^. Inductively define: 

1 '^* " k k 

p{z,i) = ^Y.^^ f^i+i=nPif^i + Z,i)], (2.8) 

where Z ~ N(0, 1) and the recursion is initialized with p{z,0) = 1. Here Li is a normal- 
ization defined, for all £>1, by Lf = 'E (X^fc=o(A^'^)'^/^' ) 

Then, for any M finite there exists d* , t* finite such that jit* > M. 

By setting /(•,t) = p{-,t) in the state evolution equations (j2.4p and (j2.5p we obtain 
Ht = fit <ind Tf = 1 for all t. 

The proof of this lemma is deferred to Section [3l Also, the proof clarifies that setting 
/(• ,t) = p{ •, i) is the optimal choice for our message passing algorithm. 



^The problem is somewhat more subtle because \Cn\ ^ A'^, see next section. 



The basic intuition is as follows. Consider the state evolution equations (j2.4p and (j2.5p . 
Since we are only interested in maximizing the signal-to-noise ratio /Uj/rj we can always 
normalize f{-,t) as to have rj+i = 1. Denoting by g{- ,t) the un-normalized function, we 
thus have the recursion 

E[g{f,t + Z,t)] 



Eb(Z,t)2]l/2 



We want to choose g{ ■ t) as to maximize the right-hand side. It is a simple exercise 
of calculus to show that this happens for g{z,t) = e'^*^. For this choice we obtain the 
iteration fit+i = Akc^''^ that diverges to -|-oo if and only if Xk > e~^''^. Unfortunately 
the resulting / is not a polynomial and is therefore not covered by Lemma [2.2[ Lemma [2.3l 
deals with this problem by approximating the function e'^'^ with a polynomial. 

2.4 The whole algorithm and general result 

As discussed above, after t iterations of the message passing algorithm we obtain a vector 
i^i)i&[N] wherein for each i, 9\ estimates the likelihood that i G Cat. We can therefore 
select a 'candidate' subset for Cat, by letting Cat = {i G [N] : 9\ > /it/2} (this choice 
is motivated by the analysis of the previous section). Since however 6\ is approximately 
N(0, r^^) for i € [N] \ Cat, this produces a set of size |CAr| = Q{N), much larger than the 
target Cat. 

Algorithm 1 Message Passing 



1: Initialize: A{N) = W(N)/y/N; 6*° = 1 for each i £ [N]; d*,t* positive integers, p a 

positive constant. 
2: Define the sequence of polynomials p{- ,t) for t £ {0, 1, . . . }, the values fit as per Lemma 



3: Run t* iterations of message passing as in Eqs. (|2.2p . (12. 3p with f{-,t) = p{- ,t) 

4: Find the set Ca^ = {^ e [N] : Of > fit*/2}. 

5: Let A\r be the restriction of A to the rows and columns with index in Cw, and 

compute by power method its principal eigenvector u** . 
6: Compute Bat ^ [N] of the top \C]\[\ entries (by absolute value) of u**. 
7: Return Cn = {i e [N] : Cf^{i) > A/2}. 

In order to overcome this problem, we apply a cleaning procedure to reconstruct Cn 
from Ca". Let Al-^ be the restriction of A to the rows and columns with index in C 



N- 



By power iteration (i.e. by the iteration u*"^^ = A]-^ u^/WAl-^ ,^*l|2i u^ £ M. '^ , with u^ 



(1,1,..., 1)"*") we compute a good approximation u** = u^** of the principal eigenvector of 
A\-f. . We then let Bat ^ [N], \^n\ = ICa^I be the set of indices corresponding to the |CAf| 
largest entries of n*** (in absolute value). 

The set Bat has the right size and is approximately equal to Cat- We correct the residual 
'mistakes' by defining the following score for each vertex i G [N]: 

C|-(i)= Y.^^Mw.A<-p}^ (2.9) 

and returning the set Ca^ of vertices with large scores, e.g. Cat = {i G [N] : (p ^ (i) > 

A|B^|/2}. 
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Note that the 'cleaning' procedure is similar to the algorithm of |AKS98j . The analysis 
is however more challenging because we need to start from a set Cat that is correlated with 
the matrix A. 



Lemma 2.4. Let A = W/\/N be defined as above and Cjy C [A''] be any subset of the 
column indices (possibly dependent on A). Assume that it satisfies, for e small enough, 
\Qn n Cat] > (1 - e)|CAr| and |CAf\CAr| < e|[iV]\CAr|. 

Then there exists t*=K = OilogN) (number of iterations in the power method) such that 
the cleaning procedure gives Cat = Cjv with high probability. 

The proof of this lemma can be found in Section [3] and uses large deviation bounds on 
the principal eigenvalue oi A\q^. 

The entire algorithm is summarized in Table [H Notice that the power method has 
complexity 0{N'^) per iteration and since we only execute O(logA^) iterations, its overall 
complexity is O(A^^logA^). Finally the scores (I2.9p can also be computed in 0{N'^) oper- 
ations. Our analysis of the algorithm results in the following main result that generalizes 
Theorem [TJ 

Theorem 3. Consider the hidden set problem on the complete graph Gn = K^^, and 
assume that Qq and Qi are subgaussian probability distributions with mean, respectively, 
0, and A > 0. Further assume that Qq has unit variance. 

If \\Cn\ ^ (l + e)\/Aye then there there exists a'p, d* andt* finite such that Algorithm 
Ul returns Cn = Cn with high probability on input W, with total complexity 0{N'^ log N). 

(More explicitly, there exists 6{£, N) with liniAr^oo ^i^, -A) = such that the algorithm 
succeeds with probability at least 1 — 6{e,N).) 

Remark 2.5. The above result can be improved if Qo and Qi are known by taking a 
suitable transformation of the entries Wij. In particular, assuminCQ that Qi is absolutely 
continuous with respect to Qq, the optimal such transformation is obtained by setting 






t '-«)-■ 







Here dP/dQ denotes the Radon-Nikodym derivative of P with respect to Q. If the resulting 
Aij is subgaussian with scale p/N , then our analysis above applies. Theorem remains 
unchanged, provided the parameter A is replaced by the £2 distance between Qq and Qi : 



A = 




(2.10) 



3 Proof of Theorem [3] 

In this section we present the proof of Theorem [3] and of the auxiliary Lemmas I2.2| 
and El 

We begin by showing how these technical lemmas imply Theorem [3l First consider a 
sequence of instances with limAr_i.oo |CAr|/\/iV = limjv-!.oo i^N = n such that kA < l/-v/e. 
We will prove that Algorithm [1] returns Cat = Ctv with probability converging to one as 
A^-^00. 



^If Qi is singular with respect to Qo the problem is simpler but requires a bit more care. 
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By Lemma 12.21 we have, in probability 

lim \^n^ = Jim J- ^ lief > /i,./2) = E{I(^,. + nZ > /i,72)} . 

Notice that, in the second step, we applied Lemma [2. 21 to the function ^|J{z) = I{x > /i(*/2). 
While this is not Lipschitz continuous, it can approximated from above and below pointwise 
by Lipschitz continuous functions. This is sufficient to obtain the claimed convergence as 
in standard weak convergence arguments |Bil08| . 

Since we used f{-,t) = p{- ,t), we have, by Lemma \T3\ fit = fit and Tt = 1. Denoting 
by ^{z) = Jf ^ e~^ '^dx/\/27r the Gaussian distribution function, we thus have 

lim%l^ = l-<^(-A:/2)>l-e-^^V8. 

Af— >oo I*- AT I 

where in the last step we used $(— a) < e~" '^ for a > and Lemma 12. 3i By taking 
M^ > 81og(2/e) we can ensure that the last expression is larger than (1 — e/2) and 
therefore [Cat n Ciy\ > (1 — e)|CAr| with high probability. 
By a similar argument we have, in probability 

lim ^ = H-f^V2) < '- , 

N-^oo iV 2 

and hence ICatJ < e|[-/V] \ Ca^I with high probability. 

We can therefore apply Lemma 12.41 and conclude that Algorithm [1] succeeds with high 
probability for kn = |CAf|/V^ — ;• k > l/VA^e. 

In order to complete the proof, we need to prove that the Algorithm [T] succeeds with 
probability at least l — 6{e,N) for all ICatI > {l + e)y^N/{X^e). Notice that, without loss of 
generality we can assume kn G [(H-e)/v A^e, K] with K a large enough constant (because 
for K,]\f > K the problem becomes easier and -for instance- the proof of |AKS98| already 
works). If the claim was false there would be a sequence of values {hn}n>i indexed by 
N such that the success probability remains bounded away from one along the sequence. 
But since [(1 + e)/v}?e,K] is compact, this sequence has a converging subsequence along 
which the success probability remains bounded. This contradicts the above. 

3.1 Proof of Lemma [272] 

It is convenient to collate the assumptions we make on our problem instances as follows. 
Definition 3.1. We say {^(A^), J^Af;^Af}{A^>i} ^•^ ^ (C, (i)-regular sequence if: 

1. For each N, A{N) = Wi\]/\/N where W^ satisfies Assumption \2. 1\ 

2. For each t>Q, f{-,t) G J-'m is a polynomial with maximum degree d and coefficients 
hounded in absolute value by C . 

3. Each entry of the initial condition 9^ is 1. 

Let A^ ,t > 1 be i.i.d. matrices distributed as A conditional on the set Ca^, and let 
A^ = A. We now define the sequence of N x N matrices {(,^}t>o and a sequence of vectors 
in R^, {C*}t>i (indexed as before) given by: 
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CtX]= E ^^if(^i-^^^t) (3.1) 



^e[JV]\{i,i} 



yt>0,i€[N] 
C'= E 4./(d^.,i) (3.2) 

ll(i[N]\i 

The asymptotic marginals of the iterates ^* are easier to compute since the matrix A''^^ 
is independent of the ^*~^ by definition. We proceed, hence by proving that ^* and 6^ have, 
asymptotically in N , the same moments of all orders computing the distribution for the ^*. 

The messages 9\^; and i\^j can be described explicitly via a sum over a family of finite 
rooted labeled trees. We now describe this family in detail. All edges are assumed directed 
towards the root. The leaves of the tree are those vertices with no children, and the set 
of leaves is denoted by L(T). We let V{T) denote the set of vertices of T and E{T) the 
set of (directed) edges in T. The root has a label in [A^] called its "type" . Every non-root 
vertex has a label in [N] x {0, 1, . . . , d}, the first argument the label being the "type" of 
the vertex, and the second being the "mark". For a vertex v £ T we let l{v) denote its 
type, r{v) its mark and |t;| its distance from the root in T. 

Definition 3.2. Let T* be the family of labeled trees T with exactly t generations satisfying 
the conditions: 

1. The root ofT has degree 1. 

2. Any pathvi,V2 ■ ■ -Vk in the tree is non- backtracking i.e. the types l{vi) , /(wj+i), /(fi+2) 
are distinct. 

3. For a vertex u that is not the a root or a leaf, the mark r{u) is set to the number of 
children of v. 

4. We have that t = max^£j;^(-y) \v\. All leaves u G L{T) with non-maximal depth, i.e. 
\u\ < t — 1 have mark 0. 

Let Tl_^j C T^ be the subfamily satisfying, in addition, the following: 

1. The type of the root is i. 

2. The root has a single child with type distinct from, i and j . 

In a similar fashion, let T^' C T^ be the subfamily satisfying, additionally: 

1. The type of the root is i. 

2. The root has a single child with type distinct from i. 
Let the polynomial f{x, t) be represented as: 

d 

f{x,t) = Y,Qlx' 

i=0 
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For a labeled tree T £ T'' and vector of coefficients q = (gf )s«,i<d we now define three 
weiglits: 

A{T)^ H A(,);(„) (3.3) 

r(r,q,t)= n <J (3-4) 

u^veE{T) 

0{T)^ n (^S-))''^"^ (3.5) 

ueL{T) 

We now are in a position to provide an explicit expression for 9\^^ in terms of a summation 
over an appropriate family of labeled trees. 

Lemma 3.3. Let {yl(A^), J^jVi^at} ^e o, {C,d)-regular sequence. The orbit 6^ satisfies: 



9* • 



Y, A{T)T{T,ci,t)e{T) (3.6) 

0*= j; A(r)r(T,q,i)^(r) (3.7) 

Proof. We prove Eq. (j3.6p using induction. The proof of Eq. (j3.7p is very similar. We have, 
by definition, that: 

This is what is given by Eq. ()3.6p since T^L.,- is exactly the set of trees with two vertices 
joined by a single edge, the root having type i, the other vertex (say v) having type 
l{v) ^ {i,j} and mark r{v) < d. 

Now we assume Eq. ()3.6p to be true up to t. For iteration t + 1, we obtain by definition: 

te[iV]\{i,i} k<d 

k 

te[iV]\{i,j}fc<dTi-Tfeer/_^^ m=l 

Notice that 7^- is in bijection with the set of pairs containing a vertex of type £ ^ {i^j} 
and a /c-tuple of trees belonging to 7^*_j.j. This is because one can form a tree in 7^| 
by choosing a root with type i, its child v with type i ^ {«,j} and choosing a fc(< d)- 
tuple of trees from 7/_j.j, identifying their roots with v and setting r{v) = k. With this, 
absorbing the factors of A^i into Ylm=i ^C^ni) and q^ into Ylrn=i^('^rm'i:t) yields the 
desired claim. D 

From a very similar argument as above we obtain that: 

^Lj= E MTnT,ci,t)e{T) 



i\= E ^(T)r(r,q,t)0(T) 



Ter* 
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where the weight A{T) for a labeled tree T is defined (similar to Eq. ()3.3p ) by: 

,t-|n| 



A(T) = Y[ a;-|"| 



(3.8) 



u^veE{T) 



We now prove that the moments of ^^ and ^j are asymptotically (in the large A^ limit) 
the same via the following: 

Proposition 3.4. Let {A{N),J^n,(^%} be a (C , d) -regular sequence, the conditions above. 
Then, for any t > 1, there exists a constant K independent of N (depending possibly on 
m, t, d, C) such that for any i G [A^]; 



Proof. According to our initial condition, 
representation we have that: 



'N 



has all entries 1. Then, using the tree 



¥.[{0, 



t\m-\ _ V^ 

Ti,...,Tmer^ 



l[TiT^,q,t) 



E 



n^(^^: 



.1=1 





m 




m 


L 


llr(r,,q,i) 


E 


[[MTe) 


Ti,...,TmeV 


j=i 




.1=1 



(3.9) 
(3.10) 



Define the multiplicity (p{T)rs to be the number of occurrences of an edge u — )• f in the 
tree T with types l{u),l{v) G {r,s}. Also let G denote the graph obtained by identifying 
vertices of the same type in the tuple of trees Ti, . . . T^. We let G|cjv denote its restriction 
to the vertices in Cat and G|c<= be the graph restricted to C^. Let ii^(G|cjv) ^^^^ -E(G|c<= ) 
denote the (disjoint) edge sets of these graphs and Ej denote the edges in G not present 
in either G|c^ or G|c<= . In other words, Ej consists of all edges in G with one endpoint 
belonging to Cat and one end point outside it. The edge sets here do not count multiplicity. 

For analysis, we first split the sum over m-tuples of trees above into three terms as 
follows: 

1. S{A): the sum over all ?7i-tuples of trees Ti, . . . , T^ such that there exists an edge rs 
in EiGlc ) U Ej which is covered at least 3 times. 

2. R(A): the sum over all m-tuples of trees such that each edge in £'(G|c'= ) U Ej is 
covered either or 2 times, and the graph G contains a cycle. 

3. T{A): the sum over all Tn-tuples of trees such that each edge in ii^(G|c= ) U Ej is 
covered either or 2 times, and the graph G is a tree. 

We also define analogous terms S{A), R{A) and T{A) in the same fashion. We have 
that in^i -'^(^^'^'01 ^ C""'^ since the coefficients are bounded by C and the number 
of edges in the tree by d^^^ . We thus concentrate on the portion E [J^^j^ j4(r£)]. When 
E[n^i^(^^)] — 0' some edge in £'(G|c'^ ) U Ej is covered exactly once. This implies 
E [n^i ^(^f)] = = E [Jl^-^ ^(T^)], since the same edge is covered only once in any 
generation. This guarantees that we need only consider the contributions S{A), R{A) and 
T{A) as above in the sums Eq. ([331), Eq. (fXTO]) . 
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We first consider the contribution S{A). We have: 



E 



n ^(^^' 



.e=i 



E 



< E 






n i^j-fci 



J2T=i 4>{Ti)jk 



j<k 



Y[E\\Ajk\^^=^'t'^'^'^^'' 



j<k 



<C^ 



(3.11) 



where a = a{Ti, . . . ,Tm) is the total number of edges (with multiplicity) in the tuple 
of trees Ti, . . . , T^- The last inequality fohows from Lemma IA.2I and observing that for 
any j, k, Aj^ is subgaussian with scale parameter p/N . The constant Ci = Ci{Ti, . . . T^) 
absorbs the leading factors from Lemma lA. 21 and is independent of A^. 

To track the dependence on N, note that the graph G is connected since the roots of all 
the trees have type i. Let n(G|cjv) [resp. n{G\c'= )] denote the number of vertices in G\cj^ 
[resp. G|c= ] not counting the root. Counting the edges with multiplicities, we have 3 + 
|i?(G|c^)| + 2(|ii;(G|c^)| + |Ej|-l) < a, implying |i?(G|c^)| + 2(|S(G|c^)| + |^j|) < a-1. 
By connectivity of G and the fact that each component in G|c= is connected by at least one 
edge to a vertex of G I cjv we have that n(G|c^) + n(G|cjv) < \Ej\ + \E{G\cc^)\ + \E{G\cf,)\ 
and n(G|cc ) < \Ej\ + |£'(G|cc )|. Combining we get: 

n(G|c,) + 2n(G|c^) < |i?(G|c,)| + 2(|S(G|c^)| + \Ej\) 
< a- 1 
For a candidate graph G, the number of possible labels of types is upper bounded by 
no. of possible labeling of G < 2"^''l^^^+"^''l^^^(/t^^/iV)'^(^lc^)(A^)"^''l^5v) 

< (4K\/iV)"-\ 

for large enough A^. Denote by U^ the set of trees 7^* with the labels removed. We then 
have, using the above and Eq. (13. lip 

\S{A)\ <C'"'^'+' Y^ Ci(\/iV)-"(4K\/iV)°-i 

< C2N'^/^, (3.12) 



where we absorbed the summation over (Z^/*)™ into C2 since it is independent of A^. The 
constant Ci appears because the same tuple of (unlabeled) trees can yield different (can- 
didate) graphs G, however their total number is independent of A^. 

Indeed, we can do a similar calculation to obtain that |-R(^)| is 0{N~^''^). For such a 
graph, n(G|cjv) + "'(G|c5^) = |£'(G|cjv)l + l-E'(G|c5;,)| + |£'j| -a for some a - ^ ^^^" ^ ^^^ 
at least one cycle. We have |ii^(G|cjv)l + 2(|-£'(G|cj^)| + \Ej\) < a by counting minimum 
multiplicities and n(G|cc ) < |£'(G|c': )| + \Ej\ by connectivity argument. Thus: 

n(G|c^) + 2n(G|c^)<a 



< a 



a 
1. 
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The number of possible labels for G is thus bounded above by {4:K\'NY 
same argument as before, we get: 

\R{A)\ < C^N-^'\ 



~^. Following the 



(3.13) 



for some constant C3 dependent only on m,d,t,K. We note here that the same bounds 
hold for S{A) and R{A). Indeed, let ip{T)fs denote the number of times an edge u ^ v 
of (distinct) types l{u),l{v) £ {r,s} is covered with \u\ = g. By definition, it follows that 
'^„^{T)rs = (j)(T)rs- We then obtain: 



E 



U^m: 



E 



< E 



j<k 9 



-^^£T=MTi)% 



Ca 



j<k 9 
1 ' 



(3.14) 



This can be used in place of Eq. (13. lip to obtain the required bounds on S{A) and R{A). 

By the bounds Eq. (j3.12p , Eq. (|3.13p , to prove our result we only need to concentrate on 
T{A). It suffices to show that T{A) = T{A). We first consider the case E [H^i A{Ti,)] 7^ 
= E [n^i ^(^^)] • This implies that there exists an edge rs in £^(G|c= ) U Ej with 
multiplicity 2, but appearing in different generations in the tuple of trees. Suppose they 
appear on the same branch of the tree, call it Ti. Then there exists a — )• 6 and c — )• d 
with {1(a), l{b)} = {1(c), l{d)} = {i,j} with a — )• 6 on the path from c to the root. Due 
to the non-backtracking property, a ^ d. However, then these edges form a cycle in G 
(formed of the edges from d to a) because the tree is non-backtracking and we arrive at 
a contradiction. Now suppose the edges a ^ b and c — )• d as above appear in different 
generations in distinct trees Ti and T2 respectively. Then as the roots of the T^'s identify to 
the same vertex, and the trees are non back-tracking, these form a cycle in G and we arrive 
at a contradiction. Using the same argument, we see that such edges as a — t- 6 and c — t- d 
above cannot exist even on different branches of the same tree in different generations. 

Now assume E [n^i^(^^)] 7^ 0- This means that every edge in £'(G|c= ) U Ej is 
covered exactly twice in the same generation and every edge in ii^(G|cjv) ^^ covered at most 
twice. Then, if E [H^i A(Te)] j^ E [H^i A(T£)], there must exist an edge rs G E(G\c^) 
covered twice i.e. with multiplicity, but in two different generations. However, by the 
argument given previously, this is not possible. We thus obtain that T(A) = T(A). Using 
this and the bounds on S(A), R(A) we obtain the required result, for an appropriately 
adjusted leading constant K depending on m, d, t and k. 

D 

Before proceeding, we prove the following results that are useful to establish state 
evolution. 



Lemma 3.5. Consider the situation as assumed in Lemma \3.4\ Then we have, for some 
constants Km(m,d,t, k), K^(m,d,t, k) independent of N that: 

iE[(e,u,ni<i^m 
\mir\ < KL 
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Proof. We prove the claim for ^j^j- The other claim follows by essentially the same 



=t^j 



argument. Recall from the tree representation of Lemma 137 



E[(«U,)'"1 



E 



Tu-,T^&V^, 



m 

_[T{Ti, CI, t) 


E 


m 

[[MTi) 
1=1 



Using the same splitting of contributions to the above sum into S{A), R{A) and T{A) as 
in Lemma l3.4^ we see that it is sufficient to prove that |r(^)| is bounded uniformly over 
N . We have: 



\nA)\= Y. 



m 

l^r(r^,q,t) 

=1 


E 


m 
.1=1 



Ti,...,T„ 

Tl,...Tm 

where a = a(Ti, . . . , T^) is the number of edges counted with multiplicity and Ti, . . . Tm 
ranges over m-tuples of trees such that the graph G (formed by identifying vertices of 
the same type) is a tree. Define n(G|c^), n(G|c= ), -E(G|c^), i?(G|c': ) and Ej as in 
Lemma [3.41 By an argument similar to that for bounding R{A), we obtain that n(G|cjv) + 
2n(G|c= ) = a. Thus we get: 



\T{A)\ < C75(ViV)-"(4KViV)" 

<Km, 

where Km = K{m, d,t, k) is a constant independent of N . For convenience, we make only 
the dependence on m explicit. The result follows, after a small change in the constant K 
since the other contributions S{A) and R{A) are 0{N~^''^). D 

Lemma 3.6. Consider the situation as in Lemma\3.4\ Then we have: 



lim Var 



lim Var 



lim Var 



\ «eCjv 

\ «eCiv\j 

U E (it 

\ ielN]\CN 
\ ielN]\CM,j 

f^E«.' 

hm Var 4 T^ i^Ui 



lim Var 

Af-s>oo 



lim Var 

iV-5-oo 
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where Var(-) denotes the variance of the argument. The same results hold with 9^ instead 

ofe- 

Proof. We prove only the first claim in detail. The proofs for the rest of the claims follow 
the same analysis. To begin with: 



Var 



1 



|C 



N\ 



E(^^ 



t\m 



JGCiv 



1 



|C 



A^l 



■E 



(E[(e*r(ejr]-iE[(e*r]iE[(ejr]). 



Note that the terms wherein i = j are 0(|CAr|), using Lemma l3.5[ We now control each 
of the remaining summands, where i,j distinct, in the following fashion. Fix a pair i,j. 



The summand ( E 



{ctne.r -E[(4*r]E (^j 



^r 



can be written as a summation over 



2m-tuples of trees Ti, . . . , T^, T[, . . . , T^ where the first m belong to 7f and the last m to 
Tf. Let G denote the simple graph obtained by identifying vertices of the same type in the 
tuple Ti, . . . , Tfn, T[, . . . , T^. Let G\cj^ and G|cc be subgraphs defined as in Proposition 



3.4[ The terms in which G is disconnected with one component containing i and the other 



containing j, are identical in E (^*)'"(^^) 



:t\m 



iC 



•3' 



and hence cancel each 



and E [(^*)'"] E 

other. If G is connected, by the argument in Lemma 13.41 all terms where G is not a tree, 
or when Glc^ contains an edge covered thrice or more have vanishing contributions. It 
remains to check the contributions of terms where G is a connected tree, and every edge 
in G|c= is covered at exactly twice. Defining n(G|cjv)) "-(G|c= ), E{G\cj^)-, ii^(G|c= ) and 
Ej as before, we have that n(G|cjv) + "-(Glc^ ) < -E'(G|cjv) + ^(^flc^ ) + Ej — 1 since 
types i and j have been fixed. As before n(G|c<: ) < -E(G|c<= ) + Ej by connectivity and 
£'(G|cjv) + 2(£'(G|c= ) + Ej) < a where a is the number of edges counted with multiplicity. 
This yields n(G|cjv) + 2n(G|c^) < a — 1. The total number of such terms is thus at most 
0{N^"~^'''^), while their weight is bounded by 0{N~°'''^). Their overall contribution, 
consequently, vanishes in limit. We thus have ^i ^^ j £ Cat: 



where e(A^) — t- as A^ — t- oo. This gives: 



Var 



|C 



N\ 



E 



(e*r <0{\CM\-') + e{N), 



and the first claim follows. 

The other claims follow using the same argument, and since |CAr| = o{N). 



D 



Proposition 3.7. Let iit,Tt be given as in Eqs. \2.4l 1^.51 Consider {A{N) , Tn , 0%) n>i 
a sequence of (C, d)-regular MP instances. Then the following limits hold for each m> 1 
and t > 0.' 



lim me. 

N-^oo 

hm Em 

TV— ^oo 



t\mi 



t\m'\ 



K[{fit + Ztr] ifi£CN 
E[(Zt)™] otherwise. 



(3.15) 
(3.16) 



where Zt ~ N{0,t^). 
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Proof. Fix j ^ i. We prove by induction over t that for all t > and m > 1: 

lim E[{i^]r] = E[(^,+i + Zt+,r] if z G C^ 
lim E[(e*+^,)™] = mZt+ir] otherwise 



N^OD 



1™ Tc^ E(^it',r=IE[(mi + ^m) 

A"— >oo Cjv ■^ — ^ 

fceCjv 






fce[Af]\c^ 



(3.17) 
(3.18) 

(3.19) 
(3.20) 



where Eqs. (j3.19p and (|3.2U|) hold in probability. For t > 1, denote by ^t the cr- algebra 
generated by A^,...,A*^^. For convenience of notation, we write A^- as the centered 
version of A^,, . Hence Aj 



Aij — \/y N if both i,j G Cat, else Aij = Aij. First consider 
the case where index i G Cat- Then we have, for any j ^ i: 



lim E 

iV->-oo 






=J-S>jl 



lim E 

Af-5-oo 



eeCN\j ee[N]\CN,j 

= AkE[/(;U( + Zt,t)] in probability 

= fJ-t+i, 

where Zt ~ N(0, r^^). Here the second equality follows from the induction hypothesis and 
the third from definition. Considering the variance we have: 



lim Var 



^X)\^t 



^t 



hm E Y. {Ay{ee^,,t)f 

^eelN]\j 

N^oo iV ^ — ' 

eelN]\j 
= nf{Zt,tf] 

where the penultimate equality holds in probability, and follows from the induction hy- 
pothesis. 

Notice that [^jl!^j|5t — E(^^^ •|5't)] is a sum of independent random variables (due to the 
conditioning on ^t)- We show that, in probability, the Lindeberg condition for the central 
limit theorem holds. By the induction hypothesis we have, in probability: 



hm 1 Y {f{CU^,t))' = E[{f{Zt,t)r 



N^oo N 



ee[N]\j 



Using this we have, for any e > 0: 



^c^i^y E (/(^L.,t))^Ao, 



\eNJ 



ee[N]\j 
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using the induction hypothesis and Lemma IA.2I The constant Cg here comes from the 
leading factors in Lemma IA.2[ It follows from Lemma lA.ll that for a bounded function 
/i : M — 7- M with bounded first, second and third derivatives that: 



lim E[h{C^\)\^t] = E[h{Zt+i + fxt+i)] in probability. 



^i^j/ 



Since the functions h^{x) = (x™)+ and h^{x) = (x™)_ for tti > 3 can be approached 
pointwise by a sequence of bounded functions with bounded first, second and third deriva- 



tives and since E 



(QXiri^t 



=i^j 



is integrable by definition we have : 



hm E 



{^X]r\dt =E[{i^t+i + Zt+ir] in probability. 



By the tower property of conditional expectation and Lemma 13.51 the expectations also 
converge yielding the induction claim Eq. (j3.17p . Employing Chebyshev inequality and 

Lemma [3.61 on the sequence < X^^gc i^k^j)"^ /l^^l \ ' ^^ obtain the induction claim 

Eq. (l3l9]> . 

We now turn to the case when i ^ Cat. By Eq. (|3.ip : 



mtx)\dt] 



^i^j\ 



■£elN]\j 



^t 



For the variance we compute: 



hm Var[ei^.|^i] 
Af— s>oo -^ 



hm E Yl {Af{^U^,t))' 
4e[7V]\j 

Af->oo iV ^-^ 
ie[N]\j 

=nif{Zt,m 

The penultimate equality holds in probability, from the induction hypothesis and the last 
equality by definition. Proceeding exactly as before, we obtain induction claim Eq. (j3.18p 
and the claim Eq. ([3:20]) . 

The base case is simpler since for t = 0, ^q is taken to be trivial. When i G Cjv we 
have: 



lim ¥.\i} 



-s-JJ 



hm E 



Ml, 



and for the variance: 



lim Var \F, 



N^oo 



-S>JJ 



Y Alf {1,0) + Yl ^°./(l'0) 

■ieCj^\j te[7V]\Cjv,j 



hm E Y (^?./(l'0))' 
lim — V (/(1,0))^ 



N^oo N 



te[Af]\Cjv,i 



Tl: 
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by definition. It follows from the central limit theorem that Ci~^j =^ N(/ii, r^) when i G C 
A very similar argument yields that ^/_^j =^ N(0, rf ) when i ^ Cat. Eqs. (|3.17p . (|3.18p 



0, T^) when i 
(IXTQI) and (l3:20]l follow for t = using Lemma ESI 

The proofs for ^* follow from essentially the same argument except that the required 
sums are modified to include the vertex j. Asymptotically in A^, this has no effect on the 
result and we obtain the following limits: 

hm E[{Clr] = E[{fit + ZtT] if i G C^ (3.21) 

lim E[(4*)™] = E[(Zi)'"] otherwise (3.22) 

lim -^ y (e*)"^ = E[(^i + ZtT] in probability (3.23) 

J™ ^ E (^**)" = mZtr] in probability (3.24) 

i6[Ar]\Cjv 

Using Eqs. (lOTD . (I3:22D and Proposition [331 the result follows. D 

We can now prove Lemma 12. 2i For brevity, we show only Eq. (|2.7p as the argument 
for Eq. (12. 6p is analogous. To show Eq. (|2.7p it suffices to show that, for any subsequence 
{Nk} there exists a refinement {A^^} such that: 

^ E m) = ni^iZt)]^.s. (3.25) 

fc i=ie[7V^]\c^, 

Fix a subsequence {A^fc}. By Chebyshev inequality. Lemma 13.61 and Proposition 13.71 
there exists a refinement {A^fc(l)} C {A^fc} such that: 

■i-^ _ E ».' = E[Z,la.s. 

"^ ^ ie[7Vfc(l)]\C^^,(i) 

By the same argument, for each m G N, there exists a refinement {A'fc(?7i)} C {A^fc(m— 1)} 
such that: 

.l-^ E («.*)"■ = E[ZJa.s. 

Let A^^ be the sequence Nk{k). Then, for all ?7i > 1: 



hm-l Yl (^*r = E[(Z,)n a.s. (3.26) 

We define the empirical measure /^Ar(.) as follows: 

/^^(■) = ]^ E k^-^ 

ie[N]\CN 

Eq. (j3.26p guarantees that, almost surely, the moments of /i^y' converge to that of Zt- By 
the moment method, Eq. (j3.25p follows and we obtain the required result of Eq. (|2.7p . 
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3 . 2 Proof of Lemma I^TBl 

This is section is devoted to proving Lemma l2.3i In particular, our derivation will justify 
the construction of polynomials in the statement of the lemma, cf . Eq. ()2.8p . 

We will first consider the state evolution recursion (j2.4p . (|2.5p for a general sequence 
of functions {/( • , t)}t>o (not necessarily polynomials). Since we are only interested in the 
ratio fit/Tt, there is no loss of generality in assuming that / is normalized in such a way 
that r/ = 1 for all t, i.e. E[f{Z,tf] = 1 for ah t. 

Lemma 3.8. Let fit be defined recursively for all t > by letting 

f,t+i = XKe'''/\ Mo = l. (3.27) 

Further, given a sequence of functions f = {/( • ,t)}t>o, such that E[/(Z, t)^] = 1 for all t, 

(f) 
let fiY be the corresponding state evolution sequence defined by 

/.;^\ = AA.E[/(/i(/) + Z)] , fil!^ = l. 

(f) 
Then fil < fit for all t, with equality verifed for t > if and only if 

f{z,i) = e^^^-^' forO<£<t. 

Further liuit-^ao /^t = oo if and only if Xk > e^^'^. 

(f) 
Proof. For the initial condition ^q = 1, tq = 0, it is easy to see that the choice of normal- 
ization ensures that we need only fix /(I, 0) = 1 which is satisfied by the choice above. We 
have, for Z ~ N(0, 1), and ^ > 0: 

= Xk [ /(z)e-(-^^'')^/2 ^" 



/2n 
'f{Z)e^'P^' 

<XKe-^^'^'^''\^[{f{ZAf]f'\^>^'P)\ 



AKe-('^^'')V2E 



where the inequality follows from Cauchy- Schwartz. By our choice of normalization we 
obtain: 

f,^},<Xne^^'P)'l\ 

2 

Since the inequality is satisfied as equality only by the choice f{z,£) = e^^^~^(, we have 

(f) 
proved that fif — f^t only for this choice. 

The last statement (namely /^t — )• oo if and only if Xk > e^^'^) is a simple calculus 

exercise. D 

We are now in position to prove Lemma 12.31 

Proof of Lemma\KM Let {nt}t>o be given as per Eq. (j3.27p and define t* = inf{t : fit* > 
2M}. The condition Xk > e~^''^ ensures that t* is finite. For a fixed d, define the mappings 
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5,5d : I^ X IK — ^ IK by letting g{z,^) = e'^^ and g{z,fi) = Ylk=ot^^^''/^^-- Then, since the 
Taylor series of the exponential has infinite radius of convergence, we have, for all 2, /i G R, 

lim gd{z, fi) = g{z, fi) . (3.28) 

a— ^00 

In the rest of this proof we will -for the sake of simplicity- omit the subscript d. 
For any // G M and Z ~ N(0, 1), we define: 

G{fi) = jE[XKg{f, + Z,f,)] 
Gif,) = jE[XKg{fi + Z,fi)], 

where L = {E[g{Z, nf])^/^ and L = {E[g{Z,iif]fl'^ . We first obtain that: 

\G{^i) - G(/i)| < G(m)^^^ + ^ |E [(g(^ + Z, ^) - g(/. + Z, y.))e^^] \ . (3.29) 

Note that \g{z,^)\, \g{z,fi)\ < e'^'^'. It follows from Eq. (13.28P and dominated convergence 
that L ^ L and E[g{n + Z, /x)] — )• E,[g{fi + Z, fi)] as d — )• 00. 

By compactness, for any 5 > 0, we can choose d* < 00 such that \G{ij.) — G{fi)\ < 6 
for < /i < 2M. Here d* is a function of S, M. Note that we can now rewrite the state 
evolution recursions as follows: 

fii+i = G{fii), 

with hq = fiQ = 1. Define A^ = {ni — (jl^\. Then using the fact that G(/x) is convex and 
that G'in) = AK/xe'''/^ is bounded by M' = G'{2M) we obtain: 

/i£+l = G{fli) 

> Gifci) - 6 

>G{ni)-M'\ni-fii\-6 
= fii+i - M'Ae - 6. 

This implies: 

A<?+i < M'Ai + 5. 

By induction, since Aq = I/Uq — /io| = 0, we obtain: 

A, < [Y^iM'A 6 

\k=0 J 

-0. 



M' 

Now, choosing d* such that 5 = M{M' - 1)/2(M'** - 1) we obtain that At* < M/2, 
implying that flf > 3M/2 > M. D 
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3.3 Proof of Lemma El 

Let A\^ be the matrix A, restricted to the rows (and columns) in Cat. Also let v G W'-'^' 



denote the unit norm indicator vector on Cat n Cat, i.e. 



if i G Cat n Cat 
otherwise. 




Define Al-p: to be a centered matrix such that: 






Throughout this proof we assume for simplicity that kn = k, i.e. ICat] = KyN for some 
constant k independent of N. The case of k^ dependent on N with limAf-i>oo i^N = k- can 
be covered by a vanishing shift in the constants presented. 

Assume that Cn is a fixed subset selected independently of A. Then the matrix A|p: 
has independent, zero-mean entries which are subgaussian with scale factor p/N. Let u 
denote the principal eigenvector of A|g . The set Bat C [N] consists of the indices of the 
I Cat I entries of u with largest absolute value. ^ 

We first show that the set Bat contains a large fraction of Cat- By the condition on Cat, 
we have that ||A|? — A|? II2 ^ Ak(1 — e). By Lemma lA.31 in Appendix [Aj for a fixed 6, 
Pli= II2 < A(l - e)K6 with probability at least 2{5C)^ e~^^^~^^ where C = 6'^/32p£. 

Using matrix perturbation theory, we get 



|w — ^Ib ^ V2sin9{u,v 
< V2' 



PIcJb 



X{1-£)kn- \\A\^J\2 
< 1.9 6, 



where the second inequality follows by the sin 6 theorem |DK70j . 

We run t^:^: = 0{logN/6) iterations of the power method, with initialization u^ = 
(1, 1, ... , l)"'"/|CArn ^. By the same perturbation argument, there is a G(l) gap between 
the largest and second largest eigenvalue of A]-^ , and {u^,u) > N~'^. It follows by a 
standard argument that the output u** of the power method is an approximation to the 
leading eigenvector u with a fixed error ||m — ii**|| < 6/W. This implies that ||m** — f || < 26, 
by the triangle inequality. Let U"*" {u") denote the projection of u** orthogonal to (resp. 
onto) V. Thus we have ||tt II2 < 4(^^. It follows that at most 366'^\Cn n C]\j\ entries in 
«-*- have magnitude exceeding (l/3)|CAr U CArl^"*^'^. Notice that u" = u** — u"*" and u" is 
a multiple of v. Consequently, we can assume Bat is selected using u", instead of v. This 
observation along with the bound above guarantees that at most 365^ [Cat H C]\j\ entries are 
misclassified, i.e. 

|BArnCAr| > (l-36(52)|CArnCAf| (3.30) 

> {1 - 6){1 - e)\CN\. (3.31) 

Here we assume 6 < 1/36. 
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The above argument proves that the desired result for any fixed set C^v independent of 
A with a probabihty at least 1 — 2(5^)'^ e~^^'^ where £, = 6"^ /32pe, for a universal constant 
c and N large enough. In order to extend it to all sets Cat (possibly^ dependent on the 
matrix A), we can take a union bound over all possible choices of Cat and obtain the 
required result. For all N large enough, the number of choices satisfying the conditions of 
Lemma 12.41 is bounded by: 

#iV(e) < 2e^^'-'^°^'\ 

Choosing 6 = e^'^, it follows from the union bound that for some e small enough, we have 
that Eq. (j3.3ip holds with probability at least 1 — Ae'^"" where v'{p,e) — >• oo as e — >• 0. 
Recall that the score Cp'^i^) foi^ & vertex i is given by: 






j6Bjv 

-^E'^i + ^f E %- E K 

jeCiv veBiv\Cjv jeCjv\Biv 

where VF/- = PVjjI|ny^..|<p}. The truncated variables are subgaussian with the parameters 
A^, p' for ^ = 0, 1 as according Wij ~ Qo,Qi- (Here A^ denote the means after truncation) 
Also, for p large enough, we may take 

a;>^a, 

A'o<^A, 
p' < 2p. 



It follows that since the sum f Yljec ^Ij ) /\^n\ is subgaussian with parameters A^, p/JCatI 
the following holds with high probability: 

C|^(i)>^A-2p(,5 + e)ifiGCAr 

R 1 

C,p{i) < tA + 2p{5 + e) otherwise. 
Choosing e < (A/20p)^ yields the desired result. 

4 The sparse graph case: Algorithm and proof of Theorem 

m 

In this section we consider the general hidden set problem on locally tree-like graphs, as 
defined in the introduction. We will introduce the reconstruction algorithm and the basic 
idea of its analysis. A formal proof of Theorem [2] will be presented in Section [5] and builds 
on these ideas. 

Throughout this section we consider a sequence of locally tree-like graphs {G]\f}N>i, 
Gm = ([A^];-£'Af); indexed by the number of vertices A^. For notational simplicity, we shall 
assume that these graphs are (A + l)-regular, although most of the ideas can be easily 
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generalized. We shall further associate to each vertex i a binary variable Xj, with Xj = 1 if 
i G Cat and Xj = otherwise. We write X = (Xj)jg[^] for the vector of these variables. It 
is mathematically convenient to work with a slightly different model for the vertex labels 
Xj: we will assume that the Xj are i.i.d. such that: 



P(X, = 1) = ^ ( 1 + 
For convenience of exposition, we also define: 



K 



-1 



Notice that this leads to a set Cat = {i E [N] : Xj = 1} that has a random size which 
concentrates sharply around Xk/vA. This is a slightly different model from what we 
consider earlier: Cat is uniformly random and of a fixed size. However, if we condition on 
the size |CAr|, the i.i.d. model reduces to the earlier model. We prove in Appendix lB.3l that 
the results of the i.i.d. model still hold for the earlier model. In view of this, throughout 
this section we will stick to the i.i.d model. 

In order to motivate the algorithm, consider the conditional distribution of W given 
X, and assume for notational simplicity that Qq, Qi are discrete distributions. We then 
have P(VF|X = x) = Y\r- --.^^ Qx^xjiWij)- Here the subscript XiXj means the product of 
Xi and Xj. The posterior distribution of x is therefore a Markov random field (pairwise 
graphical model) on Gn'- 

11 Qx,x,{Wij) 11 

{i,j)&EM iG[iV] 



'<^ = "^1"') = zM . n «»^("'«) n (ts) (1 - ^ 



Here Z{W) is an appropriate normalization. Belief propagation (BP) is a heuristic method 
for estimating the marginal distribution of this posterior, see |WJ081 IMM091 IKF09| for 
introductions from several points of view. For the sake of simplicity, we shall describe the 
algorithm for the case Qi = (^+i, Qq = (l/2)(5+i + {\/2)5-i, whence Wij G {+1,-1}. At 
each iteration i, the algorithm updates 'messages' ^\^j, 7J_j.j S 1^+, for each (z, j) G E^. 
As formally clarified below, these messages correspond to 'odds ratios' for vertex i to be 
in the hidden set. 

Starting from 7f_j.,- = 1 for all i,j, messages are updated as follows: 



where di denotes the set of neighbors of i in Gn- We further compute the vertex quantities 

ll as 




Note that ^l is a function of the (labeled) neighborhood BallG^(i;t). The nature of this 
function is clarified by the next result, that is an example of a standard result in the 
literature on behef propagation |WJ08[ IMM091IKF09] . 
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Proposition 4.1. Let W^^\\ u.i\ he the set of edge labels in the subgraph BallG'j^(i; t). If 



Given this result, we can attempt to estimate Cat on locally tree-like graphs by running 
BP for t iterations and subsequently thresholding the resulting odds-ratios. In other words 
we let 

CN = {i^ [N] : 7* > ^/A} . (4.3) 

By Proposition 14. H this corresponds to maximizing the posterior probability P(Xj = 
^il^Baiio {«;*)) ^"-"^ ^^^ vertices i such that BallGjv(*;^) i^ a tree. This in turn minimizes 
the misclassification rate P(i € C^v; i ^n) + P(* G ^N] i Cjv)- The resulting error rate 
is 

(l - -^)lP(7* > Va|X, = 0) + -^P(7* < VA|X, = 1) . (4.4) 



In order to characterize this misclassification rate, we let Tree(i) denote the regular 
i-generations with degree (A + 1) at each vertex except the leaves, rooted at vertex o 
and labeled as follows. Each vertex i is labeled with Xi E {0, 1} independently with 
P(Xj = 1) = H/yfK. Each edge (i,j) has label an independent Wij ~ Qi if Xi = Xj = 1 
and Wij ~ Qo otherwise. 

Let ■~f^{xo) a random variable distributed as the odds ratio for Xo = 1 on Tree(t) when 
the true root value is Xo 



f (Xo) = VA ^,^ _„,,, '\ , I^TreeW ~ ^^(^f ree^ = " l^o = Xo) . (4.5) 



The following characterization is a direct consequence of the fact Proposition 14.11 and the 
fact that Gn is locally tree-like. For completeness, we provide a proof in Appendix IB. 21 

Proposition 4.2. Let Cn be the estimated hidden set for the BP rule |^.3| ) after t itera- 
tions. We then have 



lim IeOCjvACatI 



1 - ^)F(f (0) > VA) + ^ff^(f (1) < VA) . 



Further, if Cn is estimated by any t-local algorithm, t/ien lim infjv_j.oo -^ ""^ IE[|CjvACAr|] is 
at least as large as the right-hand side. 

We have therefore reduced the proof of Theorem [2] to controlling the distribution of 
the random variables 7*(0), 7*(1). These can be characterized by a recursion over t. For k 
small we have the following. 

Lemma 4.3. Assume k < l/\/e. Then there exists constants 7=,, < 00, 5* = (5*(k) and 
A* = A*(k) < 00 such that, for all A > A*(k) and all t > 0, we have 

P(7*(l)<57*)>^. 
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For large k, we have instead the following. 

Lemma 4.4. Assume k > l/-v/e. Then there exists c^, = c*(k) > A* = A*(k) < cxd, 
t-t = t*(K, A) < oo such that, for all A > A*(/i;) we have 

1 - ^)p(f (0) > \/A) + ^P(f(l) < VA) < e-=*^. 
V A^ V A 



Lemma 14.31 and Proposition 14.21 together imply one part of Theorem [2l Indeed, for A 
large enough, we have that the misclassification error is ri(A'^/vA), which is the same order 
as choosing a random subset of size kA^/\/A- Similarly, Lemma [13] in conjunction with 
Proposition 14.21 yields the second half of Theorem [2j 

5 Proof of Lemma 14.31 and 14.41 

In this section we prove Lemma 14.31 and 14.41 that are the key technical results leading to 
Theorem [2j We start by establishing some facts that are useful in both cases and then pass 
to the proofs of the two lemmas. 

5.1 Setup: Recursive construction of 7* (0), 7* (1) 

As per Proposition 14.11 the likelihood ratio 7*(xo) can be computed by applying the BP 
recursion on the tree Tree(i). In order to set up this recursion, let Tree(i) denote the 
t-generation tree, with root of degree A, and other non-leaf vertices of degree A + 1. The 
tree Tree(t) carries labels Xj, Wij in the same fashion as Tree(t). Thus, Tree(i) differs from 
Tree(t) only in the root degree. We then let 

7*(Xo) = VA — — ^ , I^TreeW ~ lP(W^Tree{t) = " l^o = Xo) . (5.1) 

It is then easy to obtain the distributional recursion (here and below = indicates equality 
in distribution) 




1 + {1 + A',)^j{xe)/VA\ ^ ^^_2) 



(5.3) 



1 + 7K^^)/VA ^ 
1 + {1 + Al)jj{x,)/VA' 



l + -/j{xe)/VA 



This recursion is initialized with 7'^(0) = 7''(1) = k. Here 7^(0), 7^(1), i £ [A] are A i.i.d. 
copies of 7t(0),7((l), A^, i S [A], are i.i.d. uniform in {±1}, Xi, i G [A] are i.i.d. Bernoulli 
with P(xi = 1) = k/\/A. Finally A\ = A\iixi = Q and A\ = liixi = 1. 

The distribution of 7*(0), 7*(1) can then be obtained from the one of 7*(0), 7*(1) as 
follows: 

-...(!) 4 /nYi±ii±4iM(^y (5.5) 
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5.2 Useful estimates 

A first useful fact is the following relation between the moments of 7*(0) and 7*(1)- 

Lemma 5.1. Lei 7(0), 7(1), 7(0), 7(1) be defined as in Eqs. 1^^, ^J^, ([531) and (|53]) . 
Then, for each positive integer a we have: 

E[(7*(0)r]=KE[(7*(l)r-i] 
E[(7*(0)r]=KE[(7*(l)ri]. 



Proof. It suffices to show that: 






where the left-hand side denotes the Radon-Nikodym derivative of P-ytn) with respect to 
P^t(o)- Let i^* denote the posterior probability of Xo = given the labels on Tree(t). Let 
zv*(j;o) be distributed as z/* conditioned on the event Xq = Xo for Xo = 0, 1. In other words 

l^'iXo) = P(Xo = l|VFTree(i)) , VFj.eeW ~ HWrreeit) = ■ \Xo = Xo) . 

By Bayes rule we then have: 



dP, 



u^iO) 



V 



dP^t ~ 1 - k/VA ' 

dPj,t(i) _ 1 - jy* 

dP^t ~ k/\/A' 
Using this and the fact that z^* = (1 + j^/^A)^^ by Eq. (15. ip . we get 

d^i/'(o) _ 1 



dP, 



(l + -f^/VA^{l-K/VA) 



dPu^ii) _ 7VVA 



dP. 



(1 + j'/Vaj k/Va ' 



It follows from this and that the mapping from v^ to the likelihood 7 is bijective and Borel 
that: 

dP^t(i) V1-k/\/A 

= ^. 

Here the last equality follows from the definition of k. A similar argument yields the same 
result for 7*(0) and 7(1). D 

Our next result is a general recursive upper bound on the moments of 7*(1). 
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Lemma 5.2. Consider random variables 7*(0),7*(1),7*(0),7*(1) that satisfy the distribu- 
tional recursions in Eqs. (j5.2p . ()5.3p . (j5.4p and (j5.5p . T/ien we have that, for each t >0: 



Moreover, we also have: 



IE[7*+'(1)] <Kexp(KE[7*(l)]) , 
IE[7*+'(l)']<K2exp(3^E[7*(l)]) , 
E [7*+^(l)^] < K^ exp (10kE[7*(1)]) . 



E[7*(l) 



IE[7*+'(1)] <Kexp(AiE[7*(l)])(^l+ ^ 
E [7*+^(l)^] < K^ exp (3kE[7*(1)]) f 1 + ^^^^ 



E [7*+^(l)^] < K^ exp (10kE[7*(1)]) ( 1 + 



A 
10E[7*(1) 

A 

Proof. Consider the first moment E[7*^^(l)]. By taking expectation of Eq. ()5.3p over 
{Ae,xe}i<e<A, we get 



A 



K 1 + 27^(1)/Va ' 
^/A^ ' VA 1 + 7*(1)/\/A 



+ 



1 ^ tKi) 

Ai + 7*(1)/VA 



< K 



n[i+s*' 



The last inequahty uses the non-negativity of 7*(1) and that k > n. Taking expectation 
over {7^}i<^<Ai and using the inequahty (1 + x) < e^, we get 

E[7*+'(l)]<Kexp(KE[7*(l)]) , 

The claim for E[7*+-'^(l)] follows from the same argument, except we include A + 1 factors 
above and retain only the last. 

Next take the second moment E[7*+-'^(l)]. Using Eq. (j5.3p and proceeding as above we 
get: 
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E[Y+'{iy] = K^E 



^ / Aajix,)/VA 

l\\ i + tK^^)/^ 



1+1 



" Me 



Va^a 



ttn\2 



7*(0) 



l + 7*(0)/VA 



K jg/27*(l)VA + 37*(l)VA 



n A 



(l + 7*(l)/\/A)2 



<k" 



1 + -E[(7*(o))2] + -E ( ^^ — ;. / :1, ^:/ ^ ^ — j 



A 



(1 + 7*(1)/VA) 



<.2(l + lE[y(0)^] + ^E[7*(l)])' 



< K exp 

< K exp 



E(7*(0)2) + 2kE(7*(1)) 
3kE(7*(1)) 



Consider, now, the third moment of 7*^^(1). Proceeding in the same fashion as above we 
obtain that: 



E[7*+i(l)3] =^3eJ] 1 + - 



Aa\x,)/VA 



+ 7*(x£)/\/A 



<kMi + ^E[7*(1)]+|e[(7*(0))2] 

+ ^E /^3^*(1)'/^ + 47*(1)Va3/2 ^ V 



(l + f(l)/VA)3 J 



Since (3z^ + 4z^)/(l + z)^ < 4z when z > 0, and that k > k, we can bound the last term 
above to get: 



E [7*+^(l)^] <^(l + |e [j\0f] + 7£e [7*(1)] 

< K^ exp (3E[7*(0)2] + 7kE[7*(1)]) 

< K^ exp(lOKE[7*(l)]) . 

The bound for the third moment of 7*'^^(1) follows from the same argument with the 
inclusion of the (A + 1)*^ factor. D 

Lemma 5.3. Consider 7*(0),7*(1),7*(0),7*(1) satisfying the recursions Eqs. (I5.2p . ()5.3p . 
()5.4p and (j5.5p . Also let x be Bernoulli with parameter k/vA and A and A be defined like 
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^^ and A*^ as in Eqs. (j5.2p . (j5.3p . T/ien u;e have that, for each t > and m G {1,2,3}; 



E 



E 



log I 1 + 
log I 1 + 



A7*(x)/VA 
I7*(x)/VA ' 



t ('_,\ml 



l + 7*(x)vA 



< 



< 



2E[7*(x) 

2E[7*(a;)'"] 
A™/2 



Proof. Consider the first claim. We have: 

\ ^ V(x)/^/A ' '■ 
1 + 7*(x)a/A 



E 



log 



E 



log 1 



iKx)/^ 



1 

+ 2 



log 1 + 



l + 7*(a;)/vA^ 
7*(x)/^/A 



l + 7*(a;)/VA 



Bounding the first term we get: 
^\ l + 7*(x)A 



xF (log ( 1 - ^:!M^ ) < _^iM ) dx 



< / xP (7*(x)/\/A > e^''^" - ij dx 
E[7* 



< 



< 



„t(j,\m\ foo 



A™/2 7g (e^T'l/m _ ;^), 

3E[7*(x)™] 



AW2 



For the second term, using log(l + z) < z for z > and the positivity of 7*(x): 



E 



log 1 + 



7*(^)/VA 



l + 7*(2;)/VA 



< 



E [7* (a 



A™/2 



The combination of these yields the first claim. For the second claim, we write: 



E 



log 1 + 



A7*(x)/VA 



1 + 7* (2;' 



< 1 



E 



log 1 + 



A7*(0)/VA 



+ ^E 



< 1 



< 



1 + 7*(0)VA 

y(i)/^/A ^ 

1 + 7*(1)^/A^ 
H \ 2E[7*(0)"'] K E[7*(l)'"] 



A 



log 1 + 



+ 



Va; aW2 ^ AW2 

2E[7*(x)™] 



A™/2 
Here the penultimate inequality follows in the same fashion as for the first claim. 



D 
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5.3 Proof of Lemma 14.31 

For K < l/\/e, the recursive bounds in Eq. (|5.2p yield bounds on the first three moments 
of 7*(1) that are uniform in t. Precisely, we have the following: 

Lemma 5.4. For n < l/\/e, let 7^, = 7^(k) be the smallest positive solution of the equation 



Then we have, for all t > 0: 



7 = Ke^'^, 



E7*(l) < 7* , (5.6) 

]E(7*(1)')<-, (5.7) 

K 

IE(7*(1)')<^. (5.8) 



Moreover, we have for all t > 0: 

Ef (1) < 7* (l + ^) , (5.9) 

mW) < I (i + ^) ' (5-10) 

E(7W) < ^ (l + ^) .. (5.11) 

Proof. We need only prove Eq. (j5.6p since the rest follow trivially from it and Lemma |5.2[ 
The claim of Eq. (|5.6p follows from induction and Lemma 15.21 since i?[7*'(l)] = k < k < 7*, 
and noting that, for 7 < 7*, Kexp(K7) < 7*. D 

The following is a simple consequence of the central limit theorem. 

Lemma 5.5. For any a < 6 € M, a^ > 0, p < 00, there exists uq = nQ{a,b,a'^, p) such 
that the following holds for all n > uq. Let {Wj}i<j<„ he i.i.d. random variables, with 
^{Wi] > a/n, Var(M^i) > a'^/n and E{|l^i|3} < p/n^T'i, Then 



n 



4 = 1 

Proof. Let ag = nE{W2} and cTq = nVar(VFi). By the Berry-Esseen central limit theorem, 
we have 



^[Y,w,>b}>^( ^-"°^ p 



i=i (Jo / agv^ 

b — a 
> $ 



a / a^Wn 



The claim follows by taking ng > p /[a^^{—u)] with u = {b — a)/a. D 

We finally prove a statement that is stronger than Lemma [4.31 since it also controls the 
distribution of 7*(0). 
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Proposition 5.6. Assume k < 1/^/e and let 7^, be defined as per Lem.ma \5.4\ Then there 
exists A=K = A*(k), 5* = (5*(k) > such that, for all A > A*(k) and all t >0, we have: 

P(7*(0)>57*)>5*, 
*(1)<57*)>^. 



where 5* is given by: 



1 / 2^-bg(57*A) 



where <&( • ) is i/ie cumulative distribution function of the standard normal. 

Proof The second bound follows from Markov inequality and Lemma [5.41 for large enough 
A. As for the first one, we have using r*(0) = log7*(0): 

A+i / 



r*+i(o) = iogK+^iogi + - 

It follows from Lemma 15.31 and Lemma 15.41 that: 

E loJi + ^di:^^^ 

% l + 7*(x,)/%/A 



Aa}ixe)/VA 



+ 7*(x^)/^y ' 



> 



2k 



We now lower bound the variance of each summand using the conditional variance given 
7*(x). Since A G {±1} are independent of 7*(x) and uniform, we have: 



Var 






> E 



Var log 1 + 



Aa}{xe)/VA 



1e 

4 






7*(x) 



log 1 + 



27* (x 



A 



>^(log(l + ^) ) P(7*(x) > k/2). 



We have, using Lemma 15.41 that: 

E[7*(x)] > K 
E[j\xf] < 27,K. 

Using the above and the Paley-Zygmund inequality, we get: 



Var 



log 1 + 



Aal{xi)/VA 



1 + Y{xe)/VA 



> 



327* 



log 1 + 



> 



647* A' 

for A large enough. Now, employing Lemma 15.51 we get: 

™,~+/ s N 1 / log(57*/K) — 2k 

P(,'(0) > 51.) > -* (-8-ii2AJ_^ 



D 
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5.4 Proof of Lemma 14.41 

As discussed in Section [H BP minimizes the misclassification error at vertex i among all 
t-local algorithms provided BallG'j^(i; t) is a tree. Equivalently it minimizes the misclassifi- 
cation rate at the root of the regular tree Tree(t). We will prove Lemma [4.4l by proving that 
there exists a local algorithm to estimate the root value on the tree Tree(t), for a suitable 
choice of t with error rate exp(— 0(vA))- Note that since the labeled tree Tree(t) is a 
subtree of Tree(t), the same algorithm is local on Tree(t). Formally we have the following. 

Proposition 5.7. Assume k > l/\/e. Then there exists c* = c*(k) > A* = A*(k) < oo, 
i* = t*(K, A) < oo such that, for all A > A*(k) we can construct a t-^-local decision rule 
F : W^Tree{t*) "> {0, 1} Satisfying 

forxo G {0,1}. 

The rest of this section is devoted to the proof of this proposition. 

The decision rule F is constructed as follows. We write t^ = t*_i -|-t=„,2 with t^:^i and t*^2 
to be chosen below, and decompose the tree Tree(t=K) into its first t*^i generations (that is 
a copy of Tree(t*^i)) and its last t^^2 generations (which consist of A**'^ independent copies 
of Tree(t=i,^2))- We then run a first decision rule based on BP for the copies of Tree(t=i,^2) 
that correspond to the last t^^2 generations. This yields decisions that have a small, but 
independent of A, error probability on the nodes at generation i^,^i. We then refine these 
decisions by running a different algorithm on the first t*^i generations. 

Formally, Proposition 15. 71 follows from the following two lemmas, that are proved next. 
The first lemma provides the decision rule for the nodes at generation t*_i, based on the 
last t^^2 generations. 



Lemma 5.8. Assume k > l/ye and let e > be arbitrary. Then there exists A* = 
A*(k, e) < oo, i*_2 = i*,2('^5 A) < oo such that, for all A > A^:{K,e) there exists a t^:^2-local 
decision rule F2 : Wjree{t^ 2) '"^ F2(Wjree(t, 2)) ^ {0' 1} -siic/i that 

F(F2(VFTree(t,,2)) / ^o|Xo = Xo) < £ , 

forXo G {0,1}. 

The second lemma yields a decision rule for the root, given information on the first 
t=K_i generations, as well as decisions on the nodes at generation t^:^i. In order to state the 
theorem, we denote by Le(t) the set of nodes at generation t. For e = (e(0),e(l)), we 
also let Yi^u ;^)(e) = (^(e))ieLe(t, 1) denote a collection of random variables with values in 
{0,1} that are independent given the node labels -'^Le(t, 1) = (-^i)ieLe(t, 1) ^^d such that, 
for all i, 

F{Yi{£) ^ X,\Xi = x} < e{x) . 

Lemma 5.9. Fix k G (0, 00). There exists A,,(k) = A*(k) < 00, t*^i < 00, c*(k) > and 
£*(«:) > such that, for all A > A* there exists a t^:^i-local decision rule 

Fi : (WTree(t.,i),^Le(t.,i)) ^ Fi(WTree{t.,i)>le(t.,i)) G {0,1} 

satisfying, for any e < e^: 

ff^(Fl(i^Tree(t.,i)^Le(t.,i)) / ^o\Xo = Xo) < e"^'^, 
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5.4.1 Proof of Lemma 15.81 



The decision rule is constructed as follows. We run BP on the tree Tree(t*^2) under consid- 
eration, thus computing the likelihood ratio 7**'^ at the root. We then set F2(Wjree(i, 2)) = 
1(7**'^ > gM)_ Here /I is a threshold to be chosen below. 

In order to analyze this rule, recall that 7*(xo) denotes a random variable whose dis- 
tribution is the same as the conditional distribution of 7* , given the true value of the root 
Xo = Xq. It is convenient to define 

r*(x) = log7*(:E) 

K = log K. 

As stated formally below, in the limit of large A, r*(0) (r*(l)) is asymptotically Gaussian 
with mean /Ui(0) (resp. /Uf(l)) and variance a^. The parameters /Xi(0),/xt(l),(Tt are defined 
by the recursion: 



;,,^i(0) = K-ie2'^'W+^'^S 



1 



;xi+i(l) = K + Ke^*(i)+'^'/2 _ te^^.^(0)+2at 



(J- 



t+1 



,2A*t(0)+2a2 



(5.12) 

(5.13) 
(5.14) 



with initial conditions /io(0) = ^o(l) = ^ and Uq = 0. Formally, we have the following: 
Proposition 5.10. Fix a time t > 0. Then the following limits hold as A — )• oo.- 

r*(o)4/ii(o) + aiZ 



where Z ~ N(0, 1) and fit{0), fJ't{^),crt are defined by the state evolution recursions \5.12\) 
to [5J4\ ). 

Proof. We prove the claims by induction. We have for 1 < i < t — 1: 

A / 



r+i(0) = K + ^log 1 + - 



AlYeM/VA 



+ Ye{xi)/VA 



Considering the first moment, we have: 



E[r 




l + j'{x)/VAj 

-loJl + ^MI^ 
2 ^l l + f(x)/VA 



f{x)/VA 



H log 1 ^ 

2 ^l l + f(x)/v^ 



where we drop the subscript i. Expanding similarly using the distribution of x, we find 
that the quantity inside the expectation converges pointwise as A — )■ 00 to: 



-i\o? 



„2r'(o) 
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Since E[7'(0)^] is bounded by Lemma 15.21 for fixed t, we have by dominated convergence 
and the induction hypothesis that: 

1 



hm Err+i(0)l =K--e 



2 
^i+i(0)- 



Similarly, for the variance we have: 



Var(r+^(0)) = Var VAlog 1 + 



A'Y{x)/VA 



l + Yix)/VA 



Using Var(X) = E(X^) — (EX)'^, we find that the right hand side converges pointwise to 



i(r\\2 



I'm 



„2n(o) 



, yielding as before: 



lim Var(r+^(0)) = e 

A— >c« 



2m,(0)+2(t2 



2 



It follows from the Lindeberg central limit theorem that r*+^ converges in distribution to 
//j+i(0) + <Ti+iZ where Z ~ N(0, 1). For the base case, r''(0) is initialized with the value 
log At = K. This is trivially the (degenerate) Gaussian given by /xo(0) + (TqZ since ^o(O) = k 
and (To = 0. 

Now consider the case of r*"''^(l). We have as before: 



r+i(i) = K + ^iog 1 + - 

£=1 \ 

Computing the first moment: 



Al^}{xe)/VA 



+ Jl{xe)/VA 



E[r+^(1)] = K + AE 



K + AE 



+ 1 



log 1 + 



A'f{x)/VA 



log 1 + 



i + y(x)/vAy 
y(i)/VA 



log 1 + 



l + y(l)/VA 



i + r(o)/vA 



The second term can be handled as before. The first term in the expectation converges 
pointwise to k7*(1) = ne ^^^' . Thus we get by dominated convergence as before that: 



1 



For the variance: 



lim E[r+l(l)] = K + Ke'^»(l)+'^?/2 _ Ag2M.{0)+2a2 
A— >oo 2 

= Mm(0) + Ke'^»«+'^?/2 
= Mi+i(l)- 



Var (r*+^(l)) = A Var log 1 + 



A'f{x)/VA 



l + Y{x)/VA 
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Dealing with each term of the variance computation separately we get: 

Y(l)/^/A 



AE 



,„g. 1^ AV(:.0/^A 



l+y(2:)/VA 



AE 



log' 1 + 



i + y(i)/vA 



+ 1 



log' 1 + 



^Y(0)/VA 



i + r(o)/vA 



Asymptotically in A, the contribution of first term vanishes, while that of the other can 
be computed, using dominated convergence as in the case of r*+^(0), as: 



Mm E 

A— s>oo 



„2n(o) 



,2^J„(0)+2af 



a. 



i+l 



where we use the induction hypothesis. Similarly expanding: 



E 



\/Alog 1 + 



^*y(a;)/\/A 



l + r(a;)/vA 



AE 



+ 1 



log 1 + 



y(i)/vA 



A. 



log 1 + 



A^y(o)/ 



l + f (0)/VA 



In this case, the contribution of both terms goes to zero, hence asymptotically in A the 
expectation above vanishes. It follows, using these computations and the Lindeberg central 
limit theorem that r*"'"^(l) converges in distribution to the limit random variable /ii+i(l) + 
(Tj-i-iZ where Z ~ N(0, 1). The base case for r*"'"^(l) is the same as that for r*+^(0) since 
they are initialized at the (common) value /?. D 

Using the last lemma we can estimate the probability of error that is achieved by 
thresholding the likelihood ratios 7*. 

Corollary 5.11. Let 7* he the likelihood ratio at the root of tree Tree(t). Define "Jl^ = 

(/ii(l) +/ii(0))/2, and set ^2,t{^Jree(t)) = lif7*> e^t and f2,t{^Tree{t)) = oth 

Then, there exists Ao(i) such that, for all A > Ao(t), 

P(F2,t(W^Tree(t)) / ^o|Xo = Xo) < 2 $ 



erwise. 



/^t(l) - /it(0) 
Proof. It is easy to see, from Eqs. (|5.12p and (j5.13p that /it(l) > /it(0). By the last lemma. 



hm P(F2,t(M^Tree(t))/a:o|Xo 
A->-oo ^ ' 



$ 



Mi(l)-^t(0) 



0"t 



D 



and this in turn yields the claim. 

Finally, we have the following simple calculus lemma, whose proof we omit. 

Lemma 5.12. Let UtiO), /Ut(l), fj be defined as per Eqs. (5.12\) . Ii5.13\) . (5J^. If n > 
\l \fe, then 

/^t(l) - /^t(0) 



lim 



OO . 



cr* 



Lemma 15.81 follows from combining this result with Corollary 15.111 and selecting t*,2 
so that $( — (^t(l) — /U((0))/(Tj) < e/4 for t = t*^2- Finally we let A* = Ao(i*,2) as per 
Corollary 15.111 
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5.4.2 Proof of Lemma 15.91 

We construct the decision rule Fi : (VFTree(t.,i),'51_e(t.,i)) ^ Fi(WTree(t.,i)^Le(t,,i)) G {0,1} 
recursively as follows. For each vertex i £ Tree(i) we compute a decision rrii G {0, 1} based 
on the set of descendants of i, do be denoted as D{i), as follows 

I otherwise. 

If i is a leaf, we let rrii = Yi. Finally we take a decision on the basis of the value at the 
root: 



FliW^reeityY,^it)) 



mo 



We recall that the Yi's are conditionally independent given Xi^(^f-y We assume that 
¥{Yi = l\Xi = 0) = P(li = 0\Xi = !)=£, which does not entail any loss of generality 
because it can always be achieved by degrading the decision rule F2. We define the following 
quantities: 

Pt = IP(i^l(W^free(t)'^Le(t)) = l|^o = 0) , (5.16) 

qt = nFi{Wj,,,^^^,Y^eit)) = l\X^ = 1). (5.17) 

Note in particular that po = 1 — qo = e. 

Lemma 15.91 follows immediately from the following. 

Lemma 5.13. Let pt and qt be defined as in Eqs. ()5.16p . (I5.17p . Then there exists e* = 
e^,(/i) and t^ = t^{A,K) < oo, A.^ = A*(k) < oo, c=k = c=k(k) > such that for A > A*.■ 
Pt*<4e-^•^ (5.18) 
l-g*. <4e-^*^. (5.19) 

Proof. Throughout the proof, we use ci, C2, . . . to denote constants that can depend on k 
but not on A or t. We first prove the following by induction: 

Pt+i < e-^i^P' + e-^2A3/2 ^ ^_^3/p^ 

1 - qt+i < e-^'^P' + e-^2^'^' + e'^^/^P*. 

We let Bin(n,p) denote the binomial distribution with parameters n,p. First, let D ~ 
Bin(A,K/-v/A) and, conditional on D: 

Ntr^Bm{A-D,pt) 
Mi~Bin(D,gi). 

Prom the definition of p(, qt and Eq. (j5.15p we observe that: 

/Mt+Nt 
Pt+1 = P 



Vlt+Nt \ 

Y, Wi> kVA/2\ 
i=i J 



qt+i =^\f2Wi+Mt> KyA/2 j , 
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where we let Wi G {±1} are i.i.d and uniformly distributed. 
Considering the pt recursion: 

where A'^ ~ Bin(3A/4,pi) and the penultimate inequality follows from the Chernoff bound 
and positivity of Mf. Using standard Chernoff bounds, for A > 5 we get: 

where ci is a constant dependent only on k. Bounding 1 — qt+i in a similar fashion, we 
obtain: 

1 - qt+i = P ( ^ VFi + Mt < acVA/2 J 

[Y2Wi< -kVA/4: j +F(Mt< 3k^/A/4) . 



< 



We first consider the Mt term: 

^Mt < 3k\/A/4) <f(d< 7k\/A/8) +f(m> k^/A/8 



where M ~ Bin(0, qt). Further, using Chernoff bounds we obtain that this term is less than 
2g-KvA/i28_ rpj^g other term is handled similar to the pt recursion as: 

P ( £ VFi < -kVA/4: ] < P ( iVi < ^^ ) + P I ^ Wi> kVA/4: 

< g-3Apt/128 _^ g-ciA3/2 ^ g_K2/16pt_ 

Consequently we obtain: 

l-qt< e-3Apt/128 _^ g-ciA3/2 _^ g-K2/16pt _^ 26""^/^^^. 

Simple calculus shows that, for all A > A*(k) large enough, there exists e*, cq depen- 
dent on «; but independent of A such that 



-3Ap/128 _^ g-ciA3/2 _^ g-KVl6p ^ 



P 



for all p G [cq/vA, £*]. Since Po = £ ^ £*) this implies that there exists to such that 
Pto < co/\/A- The claim follows by taking t^ = to + 1 which yields that pt, = 0{e~®^^^'). 
Further, observing that the error 1 — qt^ has only an additional 2e~'^^^ component, we 
obtain a similar claim for 1 — g^^ . D 
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A Some tools in probability theory 

This appendix contains some useful facts in probability theory. 

Lemma A.l. Let /i : M — )■ M 6e a hounded function with first three derivatives uniformly 
hounded. Let Xn^k be mutually independent random variahles for 1 < k < n with zero mean 
and variance Vn k ■ Define: 



t'n = ^ Vn,k 
k=l 

n 

bn{e)^Y.^\Xlk\x^^^\>e\ 
k=\ 
n 

^n = / _, Xfi^k- 
k=l 

Also let Qn = N(0,f„). Then, for every n and e > 0: 

\Eh{Sn} - Eh{Gn)\ < f I + ^^^±^\ Vn\\h"'\\oo + 6n\\h"\ 



Proof The lemma is proved using a standard swapping trick. The proof can be found in 
Amir Dembo's lecture notes |Deml3) . D 



Lemma A. 2. Given a random variable X such that E(X) = //. Suppose X satisfies: 
for all X > and some constant p > 0. Then we have for all s > 0: 

where ^ = 7;- { v /"^ + 4s/0 — /^ ) • 



Further, if fj, = 0, we have for t < 1/ep: 

E fe*^" ) < 



tx2^ - 1 



1 — ept 
Proof. By an application of Markov inequality and the given condition on X: 

^{X >t)< e-^*E(e^^) 
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for all A > 0. By a syininetric argument: 

¥{X < -t) < e^*+^^+/'^V2 

By the standard integration formula we have: 

/•oo 

E{\X\') = / st'-^F{\X\ >t)dt 
Jo 

rco rco 

= / st"~^P(X > t) dt + / st'-'^F{X < -t) dt 
Jo Jo 

Optimizing over A yields the desired result. 

If ^ = 0, the optimization yields A = \/s/p. Using this, the Taylor expansion of 
g{x) = e^ and monotone convergence we get: 

oo I. 



ki 

k=0 

oo 



k im 



k\i2kY 
k=o ^ ' 



oo 

fc=0 
1 



1 — ept 



Notice that here we remove the factor of 2 in the inequality, since this is not required for 
even moments of X. D 

The following lemma is standard, see for instance [AKV02[ IVerl2j . 

Lemma A. 3. Let M G M. be a symmetric matrix with entries Mij (for i > j) which 

are centered suhgaussian random variables of scale factor p. Then, uniformly in N: 

P(||M||2 >t)< (5A)^e-^(^-^), 

where A = t^ /IQNpe and ||M||2 denotes the spectral norm (or largest singular value) of M. 

Proof. Divide M into its upper and lower triangular portions M" and M so that M = 
M" + Ml. We deal with each separately. Let mi denote the i row of M . For a unit 
vector X, since Mij are all independent and subgaussian with scale p, it is easy to see that 
{mj,x) are also subgaussian with the same scale. We now bound the square exponential 
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moment of ||M'x|| as follows. For small enough c > 0: 




<{l-epcf. (A.l) 



Using this, we get for any unit vector x: 



^" " - ^ - \NpeJ 



N 

N{t^/Npe-1) 



where we used Markov inequality and Eq. (lA.ip with an appropriate c. Let T be a maximal 
1/2-net of the unit sphere. From a volume packing argument we have that |T| < 5^. Then 
from the fact that g{x) = M x is ||M ||-Lipschitz in x: 



|M'||2 >t)<F ( max||M'x|| > t/2 j 
< |T|P(||M'x|| >t/2). 



The same inequality holds for M". Now using the fact that ||-||2 is a convex function and 
that M" and M' are independent we get: 

PdlMlla >t) <P(||M"||2 > t/2) + P('||M'||2 >t/2] 

< 2 I 5^ (^—] e~N{tyimpe-i)\ _ 



16Npe) 
Substituting for A yields the result. D 

B Additional Proofs 

In this section we provide, for the sake of completeness, some additional proofs that are 
known results. We begin with Proposition II. 1[ 

B.l Proof of Proposition 11.11 

We assume the set Cat is generated as follows: let Xi G {0, 1} be the label of the index 
i € [N]. Then Xi are i.i.d Bernoulli with parameter k/vN and the set Ctv = {i '■ Xi = 1} . 
The model of choosing Cat uniformly random of size k\/N is similar to this model and 
asymptotically in A^ there is no difference. Notice that since ecjy = uqj^/N^'^, we have 
that ||ecjy|P concentrates sharply around k and we are interested in the regime k = 0(1). 
We begin with the first part of the proposition where k = 1 + e. Let Wn = W/^/N, 
Zj\j- = Z/\'N and ec^ = ucj^/N^'^. Since this normalization does not make a difference to 
the eigenvectors of W and Z we obtain from the eigenvalue equation WnVi = Ai^i that: 

ecj^{ecj^,vi) + ZnVi = Xivi. (B.l) 
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Multiplying by vi on either side: 

(ecjv^^i)^ = Ai - {vi,Znvi) 

> Ai - ||^Ar||2- 

Since Z^ = Z/\/N is a standard Wigner matrix with subgaussian entries, [AKV02J yields 
that \\Z\\2 < 2 + (5 with probability at least Cie~'^^^ for some constants Ci{6),ci{6) > 0. 
Further, by Theorem 2.7 of |KY11] we have that Ai > 2 + min(e,e^) with probability at 
least 1 — ]\i-c2iogiogN £qj. gQj^g constant C2 and every A^ sufficiently large. It follows from 
this and the union bound that for A^ large enough, we have: 

{ecN^nf > min(e,e2)/2, 

with probability at least 1 — A^^^"* for some constant C4 > 0. The first claim then follows. 

For the second claim, we start with the same eigenvalue equation (jB.ip . Multiplying on 

either side by ipi, the eigenvector corresponding to the largest eigenvalue of Z]\f we obtain: 

(ecjv'^'i)(ecjv,</'i> + 9i{vi,ipi) = Xi{vi,ipi) , 

where 9i is the eigenvalue of Zj^ corresponding to ipi. With this and Cauchy-Schwartz we 
obtain: 

1/ \i ^ I'^i ~ ^il 

|(ecjv,^i)l < TT-— — ^■ 
\m,ecN)\ 

Let (/) = (log A^)'°s^°s^. Then, using Theorem 2.7 of [KYllj . for any 6 > 0, there exists a 
constant Ci such that |Ai - 9i\ < N-'^+^ with probability at least 1 - iy-^aiogiogAf^ 
Since fi is independent of ec^, we observe that: 



/ N \ 


= N-^l\\ - 


N 


i=i } 


\-e 

N ' 





Ee 

where (p\ (e^ ) denotes the i^^ entry of (pi (resp. ecjv) and Ee(-) is the expectation with 
respect to ecjy holding Zj\f (hence ipi) constant. Using Theorem 2.5 of [KYllj . it follows 
that there exists constants C4^,c^,cq, c-j such that the following two happen with probability 
at least 1 — _/v~'^*^°s'°s^. Firstly, the first expectation above is at most (1 — £)(j)^^N~'^/^. 
Secondly: 

-1/2 

(1 - £)4>^^ 



N 

-N ) 



JEe E('^1^C. 



max|e^ v^il < 



Ari/4 • 

Now, using the Berry-Esseen central limit theorem for {^i,ecf^) that: 

F{\{^,,ecJ\<c{Ny/'-')<^, 

for an appropriate constant c = c(e) and 6 G (0, 1/4). Using this and the earlier bound for 
|Ai — 0i| we obtain that: 

\{ec^,v,)\<cN-'/^+'^' 

with probability at least 1 — c'N~ , for some c' and sufficiently large A^. The claim then 
follows using the union bound and the same argument for the first i eigenvectors. 
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B.2 Proof of Proposition 14.21 

For any fixed t, let £j^ denote the set of vertices in Gn sucli that their t-neighborhoods are 
not a tree, i.e. 

£j^ = {i e [N] : BaWc j^{r,t) is not a tree}. 

For notational simplicity, we will omit the subscript Gn in the neighborhood of i. The 
relative size e^ = \SJ^\/N vanishes asymptotically in N since the sequence {GAr}Ar>i is 
locally tree-like. We let f^BpiW^aw^i-t)) denote the decision according to belief propagation 
at the i^^ vertex. 

From Proposition 14. H Eqs. (14. ip . (14. 2p . (14.51) . (15.11) and induction, we observe that for 

rt . 

-AT- 



P(X, = l\WBanir,t)) d 7\Xi) 

nXi = 0\WBan(i;t)) 



any i G [N]\£l ■ 



We also have that: 

N 

\CnACn\ =Y.KfBp{WBanm) + ^i). 

i=\ 

Using both of these, the fact that e^ — )• and the linearity of expectation, we have the 
first claim: 



lim 

Af-5>oo 



EICtvACatI 



7*(1)< VaW (1 



iV VA V ^ ' ^ ■ V VA 

For any other decision rule F(VFBall(i;i))) we have that: 

EHC^ACjv" 



Ao) > Va) . 



iV 



>(l-eV)lP(F(Vf^free(t))^^°) 



>(l-4 



rree(t) ^ 



since BP computes the correct posterior marginal on the root of the tree Tree(t) and 
maximizing the posterior marginal minimizes the misclassification error. The second claim 
follows by taking the limits. 



B.3 Equivalence of i.i.d and uniform set model 

In Section[2]the hidden set Cat was assumed to be uniformly random given its size. However, 
in Section m we considered a slightly different model to choose Cat, wherein Xi are i.i.d 
Bernoulli random variables with parameter 'k/^/K. This leads to a set Cat = {i : Xj = 1} 
that has a random size, sharply concentrated around Nk/\/A. The uniform set model 
can be obtained from the i.i.d model by simply conditioning on the size |CAr|. In the 
limit of large N it is well-known that these two models are "equivalent". However, for 
completeness, we provide a proof that the results of Proposition 14.21 do not change when 



conditioned on the size |CAr| = 'l2i=iXi- 



E 



CatAC 



N\ 



Nk 



-N\ 



N 



1=1 



Y.^[HWB.m))^x, 



N 



Nk 
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Let S be the event {X]j=i ^j = iVK/vA}- Notice that F(VFBaii(i;t)) is a function of 



{Xj,j E Ball(i;t)} which is a discrete vector of dimension Kt < (A + 1) + . A straight- 
forward direct calculation yields that {Xj,j £ Ball(i; t))!^ converges in distribution to 
{Xj,j G Ball(i;t)) asymptotically in N. This implies VFBaii(i;t)l'S' converges in distribution 
to VFBaii(i;t)- Further, using the locally tree-like property of Gn one obtains: 



lim — E 

N-)-oo N 



CatAC 



N\ 



IC 



Nk 



N\ 



F(ty^ 



ree(t) • 



^Xo 



as required. 
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